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Abstract 

We present a randomized algorithm for estimating the pth moment Fp of the frequency vector 
of a data stream in the general update (turnstile) model to within a multiplicative factor of lie, 
for p > 2, with high constant confidence. For 0 < e < 1, the algorithm uses space 0(n^“^/^e“^i 
log(n)) words. This improves over the current bound of \og{n)) 

words by Andoni et. al. in [5]. Our space upper bound matches the lower bound of Li and 
Woodruff [53] for e = (log(n))“^(^) and the lower bound of Andoni et. al. |3| for e = 0(1). 
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1 Introduction 


The data stream model is relevant for online applications over massive data, where an algorithm 
may use only sub-linear memory and a single pass over the data to summarize a large data-set 
that appears as a sequence of incremental updates. Queries may be answered using only the 
data summary. A data stream is viewed as a sequence of m records of the form where, 

i G [n] = {1, 2,... , n} and v G {—M, —M -|- 1,..., M — 1, M}. The record (i, v) changes the ith 
coordinate /* of the n-dimensional frequency vector f to fi + v. The pth moment of the frequency 
vector / is defined as Fp = Ylie[n]\fi\^^ P > 0. The (randomized) Fp estimation problem is: 
Given p and e G (0,1], design an algorithm that makes one pass over the input stream and returns 
Fp such that Pr[|.Fp — Fp\ < eFp] > 0.6 (where, the constant 0.6 can be replaced by any other 
constant > 1/2.) In this paper, we consider estimating Fp for the regime p > 2, called the high 
moments problem. The problem was posed and studied in the seminal work of Alon, Matias and 
Szegedy in [I]. 

Space lower bounds. Since a deterministic estimation algorithm for Fp requires 12 (n) bits [T], 
research has focussed on randomized algorithms lailllEIlEIlEllIITlESlE]. Andoniet. al. in [3] 
present a bound of log(n)) words assuming that the algorithm is a linear sketch. Li and 

Woodruff in [23] show a lower bound of log(n)) bits in the turnstile streaming model. 

For linear sketch algorithms, the lower bound is the sum of the above two lower bounds, namely, 
-|- log(n))) words. 

Space upper bounds. The table in Figured] chronologically lists algorithms and their properties 
for estimating Fp for p > 2 of data streams in the turnstile mode. Algorithms for insertion-only 
streams are not directly comparable to algorithms for update streams—however, we note that the 
best algorithm for insertion-only streams is by Braverman et. al. in 13 that uses 0{n^ ^Z^) bits, 
for p > 3 and e = S2(l). 

Contribution. We show that for each fixed p > 2 and 0 < e < 1, there is an algorithm for 
estimating Fp in the general update streaming model that uses space 0 (n^“^/^(e“^ -|- log(n))) 
words, with word size O(log(nmM)) bits. It is the most space economical algorithm as a function 
of n and I/e. The space bound of our algorithm matches the lower bound of of Li and 

Woodruff in [23] for e < (log and the lower bound log(n)) words of Andoni 

et.al. in [3] for linear sketches and e = 12(1). 


Algorithm 

Space in O(-) words 

Update time O(-) 

IW[20] 

(f-i log(n))*^^^^ 

(log*^*-^^ n)(log(mM)) 

Hss[6] 

7^i-2/Pg-2-4/p log(n) log^ (nmM) 

log(n) log(nmM) 

MW [24] 

7.jl-2/P^g-l log(n))‘^^^l 

7jl-2/P^g-l ^Qg jp)<^(l) 

AKO0 

7T,l-2/Pg-2-4/p log(n) 

logn 

BO-I [8] 

7pl-2/Pg-2-4/p log(n) log^'^^ (n) 

logn 

this paper 

n^“2/Pg-2 _|_ j.ji-2/pg-4/p log(n) 

log^(n) 


Figure 1: Space requirement of published algorithms for estimating Fp, p > 2. Word-size is 
0{log{nmM)) bits for algorithms for update streams, log^'^^(n) denotes c times iterated logarithm 
for c = 0(1). 

Techniques and Overview. We design the Geometric-Hss algorithm for estimating Fp that builds 
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upon the Hss technique presented in mm- It uses a layered data structure with L + 1 = O(logre) 
levels numbered from 0 to L and uses an ^ 2 -heavy-hitter structure based on CountSketch [12] at 
each level to identify and estimate \fi\P for each heavy-hitter. The heavy-hitters structure at each 
level has the same number of s = O(logn) hash tables with each hash table having the number of 
buckets (height of table). The main new ideas are as follows. The height of any CountS ketch table 
at level I is a* times the height of any of the tables of the level 0 structure, where, 0 < a < 1 
is a constant. The geometric decrease ensures that the total space required is a constant times 
the space used by the lowest level and avoids increasing space by a factor of O(logn) as in the 
Hss algorithm. 

In all previous works, an estimate for |/j|^ for a sampled item i was obtained by retrieving 
an estimate fi of fi from the heavy-hitter structure of an appropriately chosen level, and then 
computing \ In order for \ fi\P to he within (1 ± e)\fi\P, \fi — fi\ had to be constrained to be 
at most 0{e\fi\/p). By the lower bound results of [26], the estimation error for CountSketch is in 
general optimal and cannot be improved. We circumvent this problem by designing a more accurate 
estimator 'i9(A, k) for |/j|^ directly. If A is an estimate for |/j| that is accurate to within a constant 
relative error, that is, A € (1 ± 0{l/p))\fi\ and there are independent, identically distributed and 
unbiased estimates Xi,X 2 , ■ ■ ■ ,-^ 0 (fc) of |/j| with standard deviation (j[Xj\ < 0(|/j|/p), then, it is 
shown that (i) E ['i9(A, fe)] £ (1 ± 0(l/p)^)|/jp, and (ii) Var ['d(A,A;)] < 0{\fi\'^^~‘^a‘^[Xj\). 

The estimator -d is designed using a Taylor polynomial estimator. Given an estimate A = 
\fi\ for \fi\ such that A £ {1 ± 0(1/p))\fi\, the k + 1 term Taylor polynomial estimator denotes 
'd{X,k) = X] j=o (P(^1 “ — X)...{Xj — A), where, Xi,..., X^ are independent and 

identically distributed estimators of \fi\. Note that replacing the Xj's by |/j| gives the expression 
{^)~ ^ which is the degree-A: term Taylor polynomial expansion of \fi\P around A 

(i.e., (A -|- (|/j| — A))^. A new estimator d{X,k,r) is defined as the average of r dependent Taylor 
polynomial estimators ??’s, where, each of these r -d-estimators is obtained from a certain fc-subset 
of random variables Xi,... ,Xs, with s = 0{k), and each fc-subset is drawn from an appropriate 
code and has a controlled overlap with another fc-subset from the code. Note that now, only a 
constant factor (i.e., within a factor of 1 ± 0{l/p) ) accuracy for the estimate A of |/j| is needed, 
rather than an 0(e)-accuracy needed earlier. 

Finally, we note that Hss algorithm m used full independence of hash functions and then 
invoked Indyk’s method |19j of using Nisan’s pseudo-random generator to fool space-bounded com¬ 
putations [25]. In our algorithm, we show that it suffices to use only limited d = 0(logn)-wise 
independence of hash families, by changing the way the hash functions are composed. 


Notation 


Let M denote the held of real numbers, N denote the set of natural numbers, that is, N = 
{0,1,2,... ,}, Z denote the ring of integers, and Z+ and Z“ denote the set of positive integers 
and the set of negative integers respectively. 

For a £ M and s £ N, dehne 


a • (a — 1) 

1 


(a — s -|- 1) if s £ Z+ 
if s = 0 . 


It follows that, (i) for si, S 2 £ N, (a — si)— , and (ii) for a < 0, a- = (—1)^(—a -|- s — 1)- . 

The notation a- is taken from [27]. 
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For p G M and A; G N, denote 


if p G M and A: G N 
if p G M and k G . 

We use the well-known following identities for binomial coefficients, namely, the absorption identity: 
(D ~ f (fc-i)’ k ^ 0, and, the upper negation identity: = (—, for integer 

k. 



Review: Residual second moment and CountSketch algorithm 

Let / G Z” and let rank : [n] —)• [n] be any permutation that orders the indices of / in non¬ 
decreasing order by their absolute frequencies, that is, |/rank(i)l ^ l/rank(2)l ^ • • • I /rank(n) I • The 
A:-residual second moment of / is denoted by (k) and is defined as {k) = Ylie\n],rs.nkii)>k fi- 
We will use the CountSketch algorithm by Charikar, Chen and Farach-Colton [l2], which is 
a classic algorithm for identifying A 2 -based heavy-hitters and for estimating item frequencies in 
data streams. The CountSketch(C', s) structure consists of s hash tables denoted Ti,... ,Ts, each 
having C buckets. Each bucket stores an log(nmM) bit integer. The jth hash table uses the hash 
function hj : [n] —>• [C], for j = 1,2, ...,s. The hash functions are chosen independently and 
randomly from a pair-wise independent hash family mapping [n] —)■ [C]. A pair-wise independent 
Rademacher family is associated with each table index j G [s], that is G_r {—1,1}. 

The Rademacher families for different j’s are independent. Corresponding to a stream update of 
the form (i,u), all tables are updated as follows. 

for J = 1 to s do 

Tj[hj{i)\ = Tj[hj{i)\ + v ■ ^j{i) 

endfor 

Given an index i G [n], the estimate fi returned for fi is the median of the estimates obtained from 
each table, namely, 

fi = medianj=iTj[/ij(z)] • ^j{i) . 

It is shown in [12] using an elegant argument that 

8Fr (g/8) y/^ ^ 


fi - fi 


< 


2 Taylor polynomial estimator 

Let A be a random variable with E [X] = p and Var [A] = Singh in |29] considered the following 
problem: Given a function •0 : M —>■ M, design an unbiased estimator 6 for '^(E [A]) (i.e., E [0] = 
[A]). His solution for an analytic function -0 was the following. Let 'ijj{t) = Ylk>o^k{0)t^■ Let 
be a distribution over N with probability mass function Pu{n), for n = 0 , 1 , 2 ,... ,. Ghoose n ^ v 
and define the estimator 

0 = (p 4 n))-S„(O)-Ai-A 2 ...-A„ 
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where the X^’s are independent copies of X. The estimator satisfies 

E [0] = • p,{n) • 7„(0)E [Xi] E [X 2 ]... E [X„] = ^ . 

n>0 n>0 

However, the variance can be large; for the geometric distribution v with Pu{n) = q{l — q)^, for 
n > 0 and 0 < g < 1, it is shown in [TO] that E [0^] = (l/(?) 7n(0)((h^ + ~ q))^- 


2.1 Taylor Polynomial Estimator 

The Taylor polynomial estimator (abbreviated as TP estimator) is derived from the Taylor’s series 
of V’(h) = + (h “ by expanding it around A, an estimate of /r, and then truncating it 

after the first k + 1 terms. Let Xi,... ,Xk be independent variables with the same expectation 
E [Xj] = /r = E [X] and whose variance is each bounded above by cr^. Define 

m A, k, {Xi}l,) = EU 7. (A)(^i - A)(X 2 - A)... {X, - A) . 

where, Jj{t) is the function (t)/jl, for j = 0,1,.... Its expectation and variance properties are 

given below. Let 77 ^ = E [(Xj — A)^] = + (// — A)^, for j = 1,... , k. 

Lemma 1. Let be independent random variables with expectation pi and standard deviation 

at most a. Let q = (cr^ + {p — and let ^l: be analytic in the region [A,//]. Then the following 

hold. 


1. For some A' e (/i, A), |E [■d(?/). A,/c, - if{p)\ < |7fc+i(A')| • \p - A|^+b 

£ Var [i?(7/;,A,A:,{Xjf^J] < (Ei=il7i(A)|??^') • 

Corollaries [ 2 ] and [ 3 ] apply the Taylor polynomial estimator to '4){t) = t^. 

Corollary 2. Assume the premises of LemmaUi Further, let ifft) = t^, p > 2, p > 0, |A —/i| < ap, 
for some 0 < a < 1/2 and k + 1 > p. Then, 


E 


id{xP,X,k,{Xi}l, 


-pP 


< 


a 


1 — 0 


(fc+i) 




P 


k + 1 


bJ+i 


In particular, for p integral, E [ 7 ?(x^, A, A:, = pP. 

Corollary 3. Assume the premises of Lemma{J\ and Corollary Then 


Var 


d{xP,X,k,{Xi}t, 


< ( 1 . 08 )pV^^”^h^ • 


2.2 Averaged Taylor polynomial estimator 

We use a version of the Gilbert-Varshamov theorem from [1]. 

Theorem 4 (Gilbert-Varshamov). For positive integers q > 2 and k > 1, and real value 0 < 
e < 1 — 1/g', there exists a set C C {0,1}'^^ of binary vectors with exactly k ones such that C has 
minimum Hamming distance 2ek and log|C| > (1 — Hq{e))k\ogq, where, Hq is the q-ary entropy 
function Hq{x) = —xlog^ — (1 — a;) logg(l — x). 
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Corollary 5. For k > 1, there exists a code Y C {0,1}®^ such that |y| > each y &Y has 

exactly k I’s, and the minimum Hamming distance among distinct codewords in Y is 3kj2. 

Let y be a code as given by Corollary[5l Each y G y is a boolean vector y = (y(l), y(2),... , ?/(s)) 
of dimension s = 8k with exactly k I’s. It can be equivalently viewed as a /c-dimensional ordered 
sequence y = {yi,y 2 ,. ■ ■ ,yk) where 1 < yi < y 2 < ■ ■ ■ < yk Y s, and yj is the index of the jth 
occurrence of 1 in y. Let n : [k] ^ [k] be a permutation and y = {yi,..., yk) be an ordered sequence 
of size k. Then, 7r(y) denotes the sequence of indices (?/ 7 r(i )5 • • • ■:y-K(k))- 

Let Xi,X 2 ,..., Xg be independent random variables with expectation /r and standard deviation 
at most fj. We first define the Taylor polynomial estimator, denoted tp estimator, for given (i) 
an estimate A for fi, (ii) a codeword y gY, and (iii) a permutation vr : [A:] —)■ [k]. The tp estimator 
corresponding to y gY and permutation tt is defined as 

k V 

A, k, S, y, TT, ^ • 

v=0 l=l 


Let {7ry}y^Y denote a set of |y| randomly and independently chosen permutations that map [/c] —)• 
[k] that is placed in (arbitrary) 1-1 correspondence with Y. The averaged Taylor polynomial 
estimator AVGTP averages the |y| tp estimators corresponding to each codeword in Y, ordered by 
the permutations {vrylj^gy respectively, as follows. 

d{ip, A, k, s, y, {TTyjyeY, = 1 ^ X] (2) 

' ' yeY 


The Taylor polynomial estimator in RHS of Eqn. ([2]) corresponding to each y G Y is referred to 
simply as dy, when the other parameters are clearly understood from context. Note that for any 
y G Y and permutation iTy, E [dy] is the same. Therefore, due to averaging, the AVGTP estimator 
has the same expectation as the expectation of each of the dy'’s. 

Lemma 6. Lef p > 2, (7 = 8, A; > max(1000,40([pJ -|-2)) and s = qk. LetY <G {0^1}^ such that, (a) 
|y| > (b) each y GY has exactly k ones, and (c) the minimum Hamming distance among 

distinct codewords in Y is 3kj2. Let {Xi,... ,^s} be a family of independent random variables, 
each having expectation fi > 0 and variance bounded above by cr^. Let A be an estimate for fi 
satisfying |A — /i| < min(/r, A)/(25p) and let a < min(^, A)/(25p). Let rj = {{X — /i)^ > 0. 

Let d denote d{tP,X,k,s,Y,{7ry}y^Y,{Xi}i^i). Then 


Var [-!?] < 


(0.288)p2 

k 




3 Algorithm 

The Geometric-Hss algorithm uses a level-wise structure corresponding to levels I = 0,1,..., L, 
where, the values of L and the other parameters are given in Figure [2l 

Level-wise structures 

Corresponding to each level ^ = 0, 1, ..., L — 1, a pair of structures (HH;, TPEst/) are kept, where, 
HH; is a CountSketch(16C/, s) structure with s = O(logn) hash tables each consisting of 16C; 



Description of Parameter 

Parameter and its value 

Number of levels 

L = [logaaul 

Reduction factor 

a = 1 - (1 - 2/p)re, re = 0.01 

Basic space parameters 

/425(2a)P/2ni-2/Pg-2\ 

y min(e^/P“2^ log(n)) j 


C = {27pfB 

Level-wise space 
parameters 

Bi = / = 0,1,... ,L- 1 

Ci = 4a^C, 1 = 0,1,... ,L-1 

Cl = 16(4a^C), 

Degree of independence of 

,9L 

d = 50 [log re] 

Taylor Polynomial Estima¬ 
tor Parameters 

k = 1000[log re], r = 16k, s = 8k 

Degree of independence of 
table hash functions 

t = 11 


Figure 2: Parameters used by the Geometric-Hss algorithm. 


buckets. The TPEst/ structure is used by the Taylor polynomial estimator at level I and is a 
standard CountSketch(16C';, 2s) structure with the following minor changes. 

(a) The hash functions used for the hash tables are 6-wise independent. 

(b) The Rademacher family {Cir{i)}ie[n] is 4-wise independent for each table index r S [2s], and is 
independent across the r’s, r S [2s]. 

The hash tables {TJrlreps] have 16Ci buckets each and use the hash function hir, for r G [2s]. 
Corresponding to the final level L, only an HH^ structure is kept which is a CountSketch(C'£, s) 
structure, where = 16Cl. The structure at level L uses 0(1) times larger space for HH/, to 
facilitate the discovery of all items and their frequencies mapping to this level (with very high 
probability). 

Hierarchical Sub-sampling 

The original stream S is sub-sampled hierarchically to produce random sub-streams for each of 
the levels Sq = S D Si D S 2 D •••Sl, where. Si is the sub-stream that maps to level 1. The 
stream Sq is the entire input stream. iSi is obtained by sampling each item i appearing in Sq with 
probability 1/2; if i is sampled, then all its records {i,v) are included in 5i, otherwise none of its 
records are included. In general, is obtained by sampling items from Si with probability 1/2, 
so that Pr [i G Si^i | i G 5;] = 1/2. This is done by a sequence of independently chosen random 
hash functions gi, g 2 , ■ ■ ■, ql each mapping [re] —)• {0,1}. Then, 

ieSi iS gi{i) = l,g 2 {i) = l,...,gi{i) = 1, l = l,2,...,L . 
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If i G then for each stream update of the form (z, v), the update is propagated to the structures 
HH; and tpest^. 


Group thresholds and Sampling into groups 

Let F 2 be an estimate satisfying F 2 < F 2 < {1 + 0.01/{2p))F2 with probability 1 — and is 
computed using random bits that are independent of the ones used in the above structures. 

Let e = = l/{27p). The level-wise thresholds are defined as follows. 


To = 



Ti 


/ 1 y/2 


Qi=Ti- eTi, I G {0} U[L - 1], Ql = 1/2 


(3) 


Let fii be the estimate for /j obtained from level I using HH;. For I G {0} U [L — 1], we say that 
i is “discovered” at level I, or that ld{i) = I, if ^ is the smallest level such that \fii\ > Qi. Define 
fi = I'dii) is set to L iff f G and i has not been discovered at any earlier level. 

Items are placed into sample groups, denoted by Gi, for I G {0} U [L], as follows. An item is 
placed into the sampled group Gi if the following holds. 

1. If i is discovered at level I and \fii\ '>Ti, then, i is included in Gi. 

2. If i is discovered at level I — 1 but < Ti-i and the flip of an unbiased coin Ki turns up 

heads. 

An item i is placed in Go if |/io| > To- In other words, the sample groups are defined as follows. 

Go = {i:\fi\ > To}, 

Gi = {i: {Idii) = I and \ fi\ > Ti) or (/^(z) = 1-1 and |/il < T/_i and Ki = 1)}, / = 1, 2,... , L - 1, 

Gl = {i ■ ld{i) = L or {ld{i) = L-1 and \ fi\ < Tl-i and Ki = I)} . 


We refer to an item as being sampled if it belongs to a sample group. From the construction above, 
it follows that (I) only an item that is discovered may be sampled, and (2) if z G [n] is discovered 
at level I, then, z may belong to sampled group Gi or to the sampled group G;+i, or to neither (and 
hence to no sampled group). That is, there is a possibility that discovered items are not sampled 
(this happens when Qi < fu < Ti and Ki = 0 (tails)). 


The NOCOLLISION event 

Let ToPKi(G;) be the set of the top-G; elements in terms of the estimates |/j;| at level 1. For 
I G {0} U [L], NOCOLL/ is said to hold if for each z G Topk/(Gz), there exists a set i?/(z) C [2s] of 
indices of hash tables of the structure tpest; such that \Ri{i)\ > s and that z does not collide with 
any other item of ToPKi(G/) in the buckets hiq{i), for q G Ri{i). More precisely, 

NOCOLL; = Vz G TOPK;(G;), 3i?;(z) C [2s] (|i?;(z)| > s and 

V( 7 Gi?;(z),VjGf^;(G;)\{z} /z/g(z) / /z;,(i)) . (4) 
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The event nocoll is defined as 


NOCOLL = A^o'^OCOLL; . 

The analysis shows nocoll to be a very high probability event, however, if nocoll fails, then, the 
estimate for Fp returned is 0. 

The estimator Fp 

Assume that the event nocoll holds, otherwise, Fp is set to 0. For each item i that is discovered 
at level ld{i) < L and is sampled into sampled group at level the averaged Taylor polynomial 
estimator is used to obtain an estimate of |/j|P using the structure TPESTj^(j) at level ld[i) and scaled 
by factor of to compensate for sampling. If ld{i) = ls{i) = L, then the simpler estimator \ fi\P 
is used instead and the resulting estimate is scaled by 2^. 

The parameter A used in the Taylor polynomial estimator for estimating \fi\P is set to \fi\ = 
\fi,iFi)\- ^ ~ By nocoll, let Ri{i) = {ti,t 2 , ■ ■ ■ Rs} C [2s]. Let Xiji be the (standard) 

estimate for |/j| obtained from table Tij, that is, 

Xiji = Tij[hij{i)] ■ ■ sgn(/i), for j E Ri{i). 

The estimator -dj is defined as 

S, Y, {vTjjjgy, 

where, T is a code satisfying Corollary [5] and {vTjjjgy} is a family of independently and randomly 
chosen permutations from [k] —)• [k]. The parameters k and s are given in Figure [2l The estimator 
Fp for Fp is defined below. 

L 

Xp = Y. E 2^-i/.r. (5) 

l=0ieG,Mi)<L ieGL,ld{i)=L 


4 Analysis 

In this section, we analyze the Geometric-Hss algorithm. 


4.1 The event Q 

Let {k, 1) denote the (random) /c-residual second moment of the frequency vector corresponding 
to Si- The analysis is conditioned on the conjunction of a set of events denoted by as defined in 
Figure m 

The events comprising Q are as follows. GOODF 2 is the event that F 2 is an 1 -|- 0(l/p)-factor 
approximation of F 2 . The event GOODEST states that for all i € [n] and levels I G {0} U [L], 
the frequency estimation errors incurred by the HH; structure remains within the high-probability 
error bound for the CountSketch algorithm |12] given by Eqn. ©• However, the bounds in GOOD- 
EST have to be expressed in terms of (2(7;, 1), which are themselves random variables. The event 
SMALLRES gives some control on this random variable by giving an upper bound on (2(7;,/) 
(2ayC)^ 

as —— ^ The event AGGUEST holds if the frequency estimation for an item f at a certain 
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( 1 ) 

( 2 ) 

(3) 

(4) 

(5) 

( 6 ) 
(7) 


GOODF2 

NOCOLL 

GOODEST 

SMALLRES 

AGGUEST 

GOODFINALLEVEL 

SMALLHH 


/ 0.001 \ 

— F2 < F2 < ( 1 H— j F2, 
defined in (j3|) 

= Vf :0</<L, Vi e [n], \fii - fi\ < 
= Vf : 0 < / < L, < 


Fr (2Q,0 

Cl 

1.5F^^^ ((2a)^C) 


1/2 


2 I -1 


= ^1:0<1<L, Vie [n], \fu-fi\ < 
= Vi G Sl, fiL = fi 


>r((2a)'C)' 

2{2ayC 


1/2 


= Vf : 0 < / < L, {i : l/,il > Qi} C Topk(C0. 


Figure 3: Q is the conjunction of these 7 events 


^res / 720;)^ ^ 

level I has an additive accuracy of ^ ( 20)^0 — bounds given by AGGUEST are non-random 

functions of 1. An item i is classified as a heavy-hitter at level I if fu > Qi, that is, its estimate 
obtained from the HH; structure exceeds the threshold Qi. The event smallhh is said to hold if at 
each level, each heavy-hitter item at that level is among those with the top-C/ absolute estimated 
frequencies at that level. The NOGOLLISION event is used only by the tpest family of structures 
at each level, and ensures that each heavy-hitter remains isolated from all the other heavy-hitters 
of that level in at least half ( s) of the tables of the tpest structure at that level. 

Lemma [7] shows that Q holds except with inverse polynomial probability. 


Lemma 7. For the choice of parameters in Figure\^ Q holds with probability 1 — 0{n 


- 24 ', 


4.2 Grouping items by frequencies 

Items are divided into groups based upon frequency ranges, as follows. 


Go = {i: \fi\ > To} 

Cl = {i : Ti < \fi\ < Ti_i},l = 1,2,..., L — 1 
GL = {i:l<\fi\ < Tl-i} . 

Note that this grouping is for purposes of analysis, since the true frequencies are unknown to the 
algorithm. Since estimated frequencies may have errors, it is possible that the sampling algorithm 
samples an item i into the sampled group Gi, although, the item does not belong to the group 
Gi- It will be useful to understand the conditions under which such errors do not occur, and the 
conditions under which such errors may occur and their extent. 

Each group is further partitioned into subsets defined by frequency ranges, namely, hnargin(Gj), 
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mid(G;) and rmargin(Gz). 

lmargin(G/) = {i : Ti < \fi\ < Ti{l + e)}, I = 0,... ,L - 1, 
rmargin(G/) = {i : T/_i(l - 2e) < \fi\ < Ti_i}, I E [L] 

mid(G/) = {i :Ti + Tie < \ fi\ < Ti_i - 2Ti_ie}, I e [L - 1], 
mid(Go) = {i : 1/il > ro(l + e)} 
mid(GL) = {1 < l/il <Ti_i(l-2e)} . 

Go and Gl have no rmargin(Go) and Imargin(Gi) defined, respectively. These definitions are simi¬ 
lar (though not identical) to the Hss algorithm [15]. The ratio = (2a)^/^, for / = 1, 2 ,..., L — 1. 
The last group Gl has frequency range is [1,Tl-i) and the frequency ratio T^-i/l can be large. 

4.3 Properties of the sampling scheme 

In the remainder of this paper, we assume that c > 23 is a constant satisfying Pr [-i^] /Pr [Q] < n~^. 

Basic Property 

Lemma [8] presents the basic property of the sampling scheme. 

Lemma 8. Let i £ Gi. 

1. Let i £ mid{Gi). Then, 

|2'Pr [i £ G; I g] - l| < 2'n-^ . 

Further, conditional on Q, (i) i £ Gi iff i £ Si, and, (ii) i may not belong to any Gu, for 
V / I, that is, (i) Pr [z £ G; | ^] = Pr [z £ 5; | ^] = 2*±n“'^, and, (ii) Pr [z £ U//^;G// | q\ = 0. 

2. Let i £ lmargin{Gi). Then 

|2'+iPr [i £ Gi+i I g] + 2'Pr [z £ G, | 0] - l| < 2'n-^ . 

Further, eonditional on Q, i may belong to either Gi or G;_|_i, but not to any other sampled 
group, that is, Pr [z £ | ^] = 0. 

3. If i £ rmargin{Gi), then 

|2'Pr [i£Gi\g]+ 2'-iPr [z £ Gz_i | 0] - l| < 0(2'n-^) . 

Further, eonditional on Q, i can belong to either G^-i or Gi and not to any other sampled 
group, that is, Pr [z £ | ^] = 0. 

Lemma [8] is essentially true (with minor changes) for the Hss method |6l [15], although the 
Hss analysis used full-independence of hash functions whereas here we work with limited indepen¬ 
dence. A straightforward corollary of Lemma [8] is the following. 

Corollary 9. Let i £ Gi. Then, 

L 

2''Pr [z £Gr\g]= Y [i£Gi>\g]=l± 2'+^rz-'= . 
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Approximate pair-wise independence property 

Lemma [TO] essentially repeats the results of Lemma (8] conditional upon the event that another 
item maps to a substream at some level 1. This property is a step towards proving an approximate 
pair-wise independence property in the following section. 

Lemma 10. Let i,j £ [n], i ^ j and j £ Gr- 

1. Let j £ mid{Gr)- Then 

|2''Pr [j£Gr\i£ SuG] - 1| < 2^n-^ . 

Further, for any r / r', Pr[j £ Gr' \ i £ Si,G] =0 . 

2. Let j £ lmargin{Gr)- Then, 

|2"+ipr [j £ Gr+i \ i£Si,G]+ 2''Pr [j£Gr\i£ Si,G] - l| < 2’'+^n-" . 

Further, for any r' ^ {r, r + 1}, Pr[j £ Gj.' \ i £ Si,G'\ = 0. 

3. If j £ rmargin(Gr), then 


|2"Pr [j£Gr\i£ Si,g] + 2"-iPr [j £ Gr-i 
Further, for any r' 0 {r — 1, r}, Pr [j £ Gr' \ i £ Si,G~\ 
Corollary 11. Let i,j £ [n], i ^ j and j £ Gr- Then, 

L 

J]2’''Pr[j gG,' \i£SuG]-l 

r'=0 


\i£Sug] -1| . 

= 0 . 

< 0(2^n-") . 


We can now prove an approximate pair-wise independence property. 
Lemma 12. For i £ Gi, j £ Gm and i,j distinct, 

L 


2^+^'Pr [i £ Gr,j £Gr'\G]-l 


r,r'=0 


< 0((2^ + 2"*)n-") 


4.4 Application of Taylor Polynomial Estimator 

Let i £ Gi' for some I' £ {0} U [L — 1]. Then, i has been discovered at a level ld{i) = I (say). The 
algorithm estimates |/j|^ from the tpest structure at the discovery level I using the estimator 

di = ^{'G{t) =tP,\fi\,k,s,Y,{Try}y^Y,{^iji}j£Riii)}) ■ 

By construction, fi is defined as fu and for any j £ Ri{i), Ciji = (Var and r]iji = + 

(l/j| — |/i/|)^. We hrst show that the premises of Corollary [2] and Lemma [H are satished so that we 
can use their implications. 

Lemma 13. Assume the parameter values listed in Figure\M and that Q holds. Suppose ld{i) = I 
for some I £ {0} U [L — 1]. Then the following properties hold. 
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a) \fii - fi\ < l/i|/(26p), 


b) E 


^ijl I ^diX) 


I, \fil\ > QlJ G Ri{i),0 


\fi 


c) \fi\ > for j e Ri^(i){i), 

d) ^ 2.7(eT/)2, for j e 

X \fii - fi\ < l/i|/(26p), 

f) \fi\hijh{i) > 16p, for j e Ri^{i){i), 

g) ifld{i) = L, then, fi = fi and rjiL = 0. 

For i, k G Si, j G [2s], let Uikji = 1 iff hij{i) = hij{k) and 0 otherwise. 

Lemma 14. Assume the parameters in Figure\^ and let p > 2. Suppose i G Gi, for some I G 
{0} U [L — 1]. Then, 

|1E [di I G] -\fi\P\ < n-W|;^|P . 

Further if p is integral, then, E \ G] = 

We denote by ^ the set of random bits dehning the family of Rademacher random variables used 
by the TPEST structures, that is, the set of random bits that defines the family {f,ij{i) \ i G [n], j G 
[2s], I G {0} U [L]}. Lemma [15] shows that the event NOCOLL implies that the Taylor polynomial 
estimators are pair-wise uncorrelated. 

Lemma 15. Suppose i G Qr and i! G G^i. Then, 

I fiJi',G] =E^-[79, I fi,G]E^[di^ I X,G] . 


4.5 Expectation and Variance of Fp Estimator. 


For uniformity of notation, let "dj denote |/j| when ld{i) = L and otherwise, let its meaning be 
unchanged. Let zu be an indicator variable that is 1 if i G G/ and 0 otherwise. Since an item 
may be sampled into at most one group, ^ {OG}- Using the extended dehnition of 

mentioned above, we can write Fp as, 

L = E E 

1=0 i&Gi 
L 

= 2' • 

2G[n] /=0 

= ( 6 ) 

ie[n] 


where. 


L-l 


Yi = Y, • 

l'=0 


( 7 ) 


Lemma fTBl shows that Fp is almost an unbiased estimator for Fp. This follows from Lemma [TT] 
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Lemma 16. ]E[Fp | Q\ = Fp{l ± 0{n 

We will use the following facts that are easily proved (see Appendix). 

F2 < p> 2 , 

F2P-2 < F^-^/P, p> 2 . 

Lemma 17. Let B = Kn^~'^/Pe~'^/ log{n) and C = {27p)^B. Then, 


( 8 ) 


Var [Yi \G]< < 


(5)(10)-K •/»€m^O„) 

^2'+^(1.002)|/jpP ifi e lmargin{Go) Gi 


Lemma [18] builds on the approximate pair-wise independence of the sampling scheme 
[T^ and the pair-wise uncorrelated property of the di estimators (Lemma [T^ to show 
Cov {Yi,Yj), for i ^ j is very small. 


Lemma 18. Let i ^ j- Then, 


(Lemma 
that the 


\Coy {Yi,Yj \ g)\ < Oin-^+^)\fi\P\fj\P . 
Lemma [T9] gives a bound on the variance of the Fp estimator. 


Lemma 19. 


Var[Fp I g] < 


g2^2 

C J. p 

50 


Putting things together 

Theorem 1201 states the space bound for the algorithm and the update time. 

Theorem 20. For each fixed p > 2 and 0 < e < 1, there exists an algorithm in the general update 
data stream model that returns Fp satisfying \Fp — Fp\ < eFp with probability 3/4. The algorithm 
uses space log(n)) words of size 0(log(nmM)) bits. The time taken to 

process each stream update is O(log^n). 
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A Proofs for the Taylor Polynomial estimator 

Fact 21. Let k> p>t). Then, | (|) | < particular, if p G Z+, then, = 0. 

Proof. The second statement is obvious, since for A: > p > 0 and p integral, = 0. Otherwise, for 
non-integral p, using the absorption identity [pj + 1 times, gives 



IM+L 


\ ('p-bl-1 
/ Vfc-[pJ-l 



(-!)*( 


k—p—1 

k-\p\-l 


) 


Now, for 0 < j < [pJ, 1:4 < |, since p <k. Therefore, Similarly, < 

( fc-~[pj~-i ) ^ Taking absolute values, | (f) | < (f)^. □ 

Proof of LemmaUl Fix if, A and k and let il = A, A:, Ai,..., X^). Using linearity of expectation 
and independence of Xfs we have. 


E [i?] = E 


k j 

^74A)n(^.-A) 

j = 0 D=1 


k j 

= E 4 (A) - A) = V’(A + p - A) - 7fc+i(A')(/i - A)^+i 

j=0 D=1 


for some A' G (p, A) by the Taylor series expansion of V'(p) = 4(A + (p — A)) around A. The Taylor 
series expansion of V'(p) around A exists since is analytic in the interval [p,A]. Therefore, 

|E[^]-4(/^)l<l7fc+i(A')llA^-A|^+' 


proving part (i) of the lemma. 

For j = 0,1,... , A;, let 

P, = I1(V. - A) 

1=1 

(which implies that Pq = 1). Then, 

k 

« = Ev(a« . 

j=0 
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By the independence of the Xi's, 


Var [Pj] = Var 


U=i 


f[]E[(Xz-A)2] -f[(IE[Xz-A])' 


1=1 


1=1 


= . 


Further for 1 < 3 < f < k, 

Cov {Pj,Pf) = Cov I J];(Xz - A), J](Xz - A) 


. 1=1 


1=1 


= E 


n(x,-A)n(x,-A) 


1=1 


1=1 




= n E [(X, - A)2] [J E [X; - A] - (/r - xy+^' 

1=1 i=j+i 

= xy'-^ - in-xy+^' . 


Thus we have, 

k 


Var [^] = ^(7,(A))2Var[P,] + ^ 27 ,(A) 7 ,,(A)Cov {Pj,Py) 


j=0 


j<j' 


- (f* - P’) + E - \y'-’ - (M - p+i') 

j=0 0<j<j'<k 

k 

- (f* - A)"') + E 27i(A)7,.(A)(,"J(M - A)r-J -{p- p+i') 

j=i 

/ // 


E(7,(A7(i;"'-(f<-A7)+ E 2i,(A) 7,.(A) n ( 

j=l l<j<j'<k V 


ij^xy 

rf 


(9) 


Let tj = (/i — A)-^' ^ry^ (l—)^'^). Since, 7 ^ = + (/i — A)^, we have, \tj\ < |/r — Ap' 

rjj+i', Taking absolute values on both sides of Eqn. Q, we have, 

k 

Var[r?] < ^7|(A)7/2J + ^ 2|7j(A)||7jv(A)|V+^' 

J —1 

El7.(A)|7^-'' 

i=i 


< 


□ 
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Proof of CoroUary\^ A > /x(l — a) > 0 since, 0 < a < 1 and /i > 0. Hence, ipft) = is analytic in 
the interval [/r. A] (or, [A,/i] depending on whether < A or A < fj,). 

Let i) abbreviate = fP Note that for the function 'ip{t) = 'yki'w) = 


1 / dk 


k\ V dP 


tP 


t=W 


= {^)wP Applying LemmalU there exists A' G (A,^) such that. 


|E[^?]-/rP| = |7fc+i(A')|l/^-A|^+' = 


A: + 1 




k+l 


< 


p 


k + l 
P 

k + l 


1p1+i 


P 


P ^ ^ (1 ~ otY ^ ^ , since. A; + 1 > p and by Fact | 


LpJ+1 


a 


1 — a 


k+l 


(1 - 


In particular, if p is integral, then, = 0 and E [i?] = pP 


□ 


Proof of Corollary 0 For ifY) = Jv{^) = ((()A^ We also have from the assumptions that 

+ = + - + +< or, I < f. 

By Lemma m part (2), 


Var [t?] < ( ^ 


Kv=l 




E 

KV=1 


1 ' 


( 10 ) 


The ratio of the (u + l)st term in the summation in the RHS to the uth term, for l<u<A; — 1, is 

p — V 


u + 1 


p ^ {p-l)V2 1 

A “ 2(25p) 25V2 


Substituting in Eqn. (fTU|l for Var [lA] and using A < p{l + we have. 


/ k \ ^ 

Var[i?] < A^p-^t^V f J^(25V2)"(^"^) J < (1.08)pV^^"^7^ • 


□ 


B Proofs for Averaged Taylor Polynomial Estimator 

Proof of Corollaryl^ Choosing q = 8 and e = 3/4 in Theorem [J] gives a code Y C {0,1}®^ of binary 
vectors with exactly k I’s and minimum distance 3k/2. So, Hq{e) = 0.9722648... and hence, by 
Theorem!! log|yl > (1 - iLg(e))A:log8 or, |y| > > 20-08^ □ 

Recall that Y C {0,1}^ where, s = 8k, is a code such that every y G Y has exactly k I’s, and 
the minimum Hamming distance between any pair of codewords in Y is at least 3k/2. Equivalently, 
y can be written as an ordered sequence {yi,y 2 , ■ ■ ■ ,yk) where, 1 < yi < y 2 < ■ ■ ■ < yk s are the 
coordinates of the position of I’s in the s-dimensional binary vector y. Eor example, let s = 4 and 
k = 2—then the vector (1,0,1,0) is written as the 2-dimensional ordered sequence (1,3). We will 
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say that u £ y if u is one of the yj’s in the ordered sequence notation. This notation views the 
sequence (1,3) above as a set {1,3}. 

Given codewords y,y' € Y, y Ciy' denotes the set of indices that are 1 in both y and y'. Let 
TT : [k] ^ [k] be a permutation and y = (yi,..., yk) be an ordered sequence of size k. Then, 7r{y) 
denotes the sequence {y-K(i)^yn{ 2 )^ ■ ■ ■ ^V-Kik))- The prefix-segment of 7r(y) consisting of its first v 
entries is (y 7 r(i),..., yn(v))- Let y, y' be ordered sequences of length k and let tt, vr' be permutations 
mapping [k] —)• [A:]. Let Qyyi^^i denote the set of common indices shared among the first v positions 
of 7r(y) with the first v' positions of 7r'(y'), that is, 

Qy - y '- Kw ' {2/77(1)) 2/77(2) !■■■ 7 2/77(77)} G {//tt')!) ; 2/77'( 2 )) ■ ■ ■ 7 2/77' (77')} • 

Let qyyi^^i denote the number of common indices, that is, 

vv' _ I 

^yy'iTTT' l^yy'Tm' \ ‘ 

Given distinct codewords y,y' gY and permutations vr and vr', Qyy,^^, is abbreviated as Q'"'"' and 

r/Vv' „„ „vv' 

^yy'-Kn' ^ 

In the remainder of this section, we will assume that T is a code of s = 8A:-dimensional boolean 
vectors of size exponential in k, as given by GorollaryO The function for the Taylor polynomial es¬ 
timator will be = fP. Let 'dy abbreviate the estimator -dy = A, fc, s, y, -Ky, {^i}f=;^), 

where, A is some parameter. 


B.l Covariance of 'dy^'^y' 

Lemma 22. Let q = k > 1 and s = qk. Let Y be a code satisfying Corollary 0 Let 
{Afi,... ,Xs} be a family of independent random variables, each having expectation p > 0 and vari¬ 
ance bounded above by Let A be an estimate for p satisfying |A — /r| < min(/i, A)/(25p) and let 
a < min(/i, A)/(25p). Let rj = {{X — p)"^ + > 0. Let d denote A, k, s, Y, {TTyjygy, 

and let dy denote the estimator dy = d{t'P, A, k, s, y, iTy, {^z}f=i)- Then, for y,y' gY and y / y', 


Cov [dy,dy') 


E iv{xh.'{x){p-xr+^'K 


< 


v'=l 

k 




q 


vv 

yy'TTyTTy, 



{p - A)2 



if pi-X, 


if p = X. 


Proof of Lemma[2R By definition, -d = |^ Yhyey'^y Tix y,y' G Y, with y i y' and let tt = iTy 
and tt' = TTyr abbreviate the random permutations corresponding to y and y'. Let q^y,^^^ ^ be 
denoted by Now, 

( k \ ^ k k 

^2^v{x){p-xy\ =EE 'yv{X)-fv'{X){p - xy+^' . 

77=0 / 7 = 0 77'=0 
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Further, from the definition of 'dy and and by linearity of expectation, 

\ / k v' 


E [^y^y>] = E 


k V 


n 


. \i)=0 1=1 


X] 7i^(A)7t;'(A)E 


v,v '=0 


\v '=0 
v' 


m=l 




11=1 


m=l 


Fix 7r,7r7 There are ^ indices that are common among the first v positions of Tiy{y) 


and the first v' positions of 'Kyi{y'). This set of common indices is given by Q^""' = 


yy'TTylTy, 


{yn{i), ■ ■ ■, 2 / 7 r(^)}n{y^,(i),..., Also, let denote the union {?/^(i),..., y^(^)}U 

■ -^y'n'iv)}- Hence we have, 

ri(v,„„ - A) n - A) = n - a)" n ■ 


1=1 

Taking expectation, 


m=l 






E 




ll=l 

= E. 


m=l 


'7Ty,7T / 


= E. 


“'■'i/iAr ' 




n(A'y^(o n -^) I V 


U=1 


m=l 




^ ) I TTy 5 


= E, 




= E. 




J] E[(Xi-A)2] Yl E[Xi\{X,-X)\7ry,7ry 


by independence of the Xi ’s for z G [s]. 
Therefore, 

COV {'&y,'dy') 

= e[Vf]-e[^?,]e[z?,,] 

k / 

= Yl 7.(A)7.'(A) E 

v,v'=0 \ 

k 

= Y 7.(A)7.'(A)(e, 

v,v'=0 
k 

= Y 7t;(A)7i;'(A) (e, 

v,v'=l 


U=1 


y’“y 


- (^ - A) 


n - A) 

m=l 


v-\-v' 


( 11 ) 
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where the last step follows by noting that if u = 0 or ?;' = 0, then = 0 and so, (/i — 

y^Y+v'- 2 q'’^ = (/i _ Hence the summation indices v^v' in (jlip may start from 1 instead of 

0 . 

Case 1: = X. liv ^ v', then, 2q^'^' < 2min(t!,ti') < v + v', Hence, the term {ji — = 

0. In this case, Eqn. m becomes 


lyny'l 

E -K[^y]K[^y,] = ^ 7.^(A)P= u] (l2) 

V=1 

Case 2: /i / A. Then, Eqn. m can be written as 


k 

= E 7.(A)7.'(A)(h-A)^+"' 

v^v'=l 




(13) 


This proves the Lemma. 


□ 


Let y be a code satisfying the properties of Corollary [5] and let y,y' and distinct such that 
t = \y D y'\. Let TTy,'7ry' denote randomly and independently chosen permutations from [k] [k], 

Dehne 


k 

P,y = E 

v,v'=l 






r=l 


_ v 

(h - A) 


r 

Pr 

'^'lTy,Wy, 



Qyyl — A ^ 

l<v,v'<k 









(14) 

(15) 


Corollary 23. Assume the premises and notation of Lemmaand let pL X. For y,y €Y and 
y y' such that t = \y (1 y'\, let iTy,-Kyi denote randomly and independently chosen permutations 
from \k\ [A;]. Then, 

Cov ipdy, ^y^) E ^yy' T Qyy' 

Proof. Since '^(x) = x^, 7 i;(A) = ((() A^”"". The Corollary follows by substituting this into Lemma[22j 

□ 


B.2 Probability of overlap of prefixes of y and y' after random ordering 

Lemma 24. Let Y he a code satisfying the properties of Corollary Let {iTyjy^Y be a family of 
random and independently chosen permutations from \k] —>■ [A:]. For distinct y,y' € Y, 
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Proof. Fix y,y' G Y and distinct and let t = t{y,y') = \y (1 y'\. By notation, Try{y)[v] is the 
n-sequence r = ...,and vry(y')M is the n'-sequence u = ...,The 

permutations iTy and TVyi are each uniformly randomly and independently chosen from the space of 
all permutations [k] —)• [k] (i.e., Sk). 

The problem is to count the number of ways in which the v positions in r and the v' positions 
in 1 / can be filled, using the elements of y and y' under permutations TTy and TVyi such that t Civ 
has exactly r elements. Since vr^ and iTyi are uniformly random and independent permutations, the 
sample space has size k- ■ k— = There are t elements in common among y and y' 

and we wish for r and u to have r elements in common. Suppose r has r + s elements from the t 
elements in common, where, s ranges from 0 to max(t — r,v — r). These are selected in ways. 
Having chosen these elements, we select r elements in ways-these elements are included in v 
as well. We have now filled r + s positions of r and r positions of s. The remaining n — (r + s) 
positions may be hlled out of the k — t elements of y that are not common with y'. This is done in 
(.uJ(7+s)) ways. There are v' — r positions remaining to be filled in zz. There are k —1 + (t — {rs)) 

elements to choose from, which can be done in ways. The v elements chosen for r and 

the v' elements chosen for u can be rearranged in n! and v'\ ways. Thus, 



which proves the lemma. 

□ 


B.3 Estimating Qyy/ 

Lemma 25. Assume the premises and notation of Lemma\2^ and Corollarv\23\. Let p>2 and let 
y,y' GY and distinct. If p ^ X, then Qyy/ < 0. 

Proof. Fix y,y' gY and distinct and let Q denote Qyy/. Let a = Then, 

Q = —Qi + Q2 


where, 


Qi 


E 

l<v,v^<k 


P 




v-\-v' _ 



Q2 


E 

l<v,v'<k 







(18) 

(19) 


Consider Ylt=i 


(O)a’'. The absolute value of the ratio of the v + 1st term 


n = 1, 2 ,..., /c — 1, is 


v + 1 - \2J 2bp - 50 


to the nth term, for 


25 







Therefore 



<ipa)'£m 

V>1 




jpg) 

49 


Therefore, 

Qi = A2Ppa(^l±l) eX‘^Ppa(^l±^^ 
Consider Q 2 . Let t = t{y,y') = \yriy'\. 


where, 



( 20 ) 


( 21 ) 


Consider Rut- The absolute value of the ratio of the {v + 1)*^* term in the summation Rut to the 
uth term for max(u, l)<u</c — f + u—lis 


\p — v\ /k — t — v + uX /u + l\ /p\ /k — V — {t — u)\ 1 1 

v + 1 \ v — u + 1 j yfc —uy*^“V2/\ k — v j 25p “ 50 


Case 1: u <1. Then, 


Case 2: u>2. Then, 


In either case. 


R^ 


ut 


pa 

^ T 



Rut s 


)a 


0 V 


(i±^ 

49 


Rut G 


( p 
Vmax(u,l) 


)« 


max(u,l) 


( ^ ) 

\max(ti,l)/ 




( 22 ) 


26 

















/fc— 

Now consider Sut = Z]d=i (^) ^ absolute value of the ratio of the v + 1th term in 

\vj 

the summation Sut to the vih term, u = 1,2,..., A: — ti — 1 is 

Ip —ul f k — u — v\ /V + 1\ pa 1 

' ' ' ' a < — < — . 


u + 1 V u + 1 


k — V 


2 “ 50 


Therefore, 


Sut G 


p{k — u)a 
k 


1 

1 zb — 

49 


(23) 


Substituting Eqns. (1221) and ([23]) in Eqn. (|2T]) . we have, 


u={) 

1 


G 1± 


49 


ut 


t ft 


i±±]x^py 

49 I ^ 


u=0 


(max(u,l)) ^ 


(24) 


Consider the summation term in Eqn. 


0 - n) ^ ^ Q (P)a-ik - u) 

- / fc X - = pa + y^ ' 


u=0 


( ^ ] 
\max(ii,l)/ 


u=l 


0 


(25) 


Consider the summation term in Eqn. (|25l) . The ratio of the absolute value of the u + 1st term to 
the uth term, for l<u<t—lis 


t — u\ /|p — u|\ /u + l\ fk — u — 1 
u + l/ V^t + l/ Vfc — \ k — u 


since, t < k/A from the property of the code Y. 
Therefore, from Eqn. (|25l) . 

* CUP 


a < 




p tpa{k - 1) / 

^ k^ \ 199 

u=l \u) 


pa 


Izb 


199 


since t < A:/4. 

Substituting in Eqn. ((24l) . we have, 

(?) (--?(- 1 ?)) ^ (^) 

Using Eqns. (1201) and (1^ . we have, 

Qi - Q2 > X^^ipa)^ 

> 0 


(26) 


since, k > 3. Hence, Q = —Qi + Q 2 < 0. 


□ 
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B.4 Estimating Pyy'. 

Notation. Let E be a code satisfying Corollary [5l Let y,y' £Y and distinct and lei t = \y Ciy' 
Let P denote Pyy'- Let a = and /3 = ^. Define 


= E > 


u=l ^ ^ r=l 


Ip^IIp^II 


a\ 




*- r >p,pnon-integral 


• ((1 - \a\)P-^+P-^ + 2(1 - + ( 27 )-(^-®^)) 1 

C) e i;y ■ ((' - i“i)-“+(I) i.^P< 


u,p non-integral 


t / ^ \ u 


^3-a^^e(:)e(!V^'^"'""“' 






/50 


^u<p 


u=l ^ ^ r=l ^ ^ 

Lemma 26. Assume the premises and notation of Lemma\2^ and Corollary \2A Let y,y' and 

distinct and let tt = TTy and P = TTy/ be random permutations from [k] —)• [k]. Let a = and 
2 

P = jj. Then, Pyy' = 0 if p is integral, and otherwise, Pyy' < Pi+ P 2 + Pz- 


Proof. Let P denote Pyy'. Then, 


P = X^P 

v^v'=l 

v,v'=l 


^ 'p\fp\aV+v'-2r 


Vj \V‘ 

p\(p 

V j \v‘ 


E/J-p 


t 


r=l 






1 


=r 


■E 


t\ fu\ f k — t\ f k — u 


1 Py 1 

u=\ ^ ' r=l 


"='E : E L' E 


1 (d (/) ^ W \p 

r=l \v) \v'/ u=r \ 2 \ 

k /X (k-t\ \ / k 

\v—u) 


\v=u 


0 


■ a 


E 


\v—r 


V — uj \v' — r 

k—u 


/ k—u\ 
\v'—r) 

“er 


a 


1 Py 1 

!i=l ' ^ r=l 




where, 


\v/ \v—uJ 


Pur — ^ ^ 


a 


o 


and K.= E- ^ 


e) 


We first obtain upper bounds on Uur and Vur- 


k (P)(>^-C 

\v/ \v—u/ 


Uur — ^ ^ 

v=u 

k 

= E 


a 


() 

v=u \vj 

^ pa yp-u\ I k-t\ v-u+(u-r) 
\v — u)\v — u)^ 


fca 

\v—u) 


k—u 


7» 7/_7^ r\j LL / \ /k — i\ 

sr^ ip-u\ [ w ) 


E 

ti;=0 


W 


(Y) 


a 


(27) 


(28) 


(29) 
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by letting w = v — u. 

Case U.l: u > p. Note that if p is integral then Uur = 0. Otherwise, sgn((^““)) = (—1)^ 
Using this and since 0 < t < u, we have, 


k—u / X lk—t\ 

n V ^ / /fc-M) 

w=0 ^ ^ \ w J 


< (^^^)(-iri«r = (1 - |al)^"“+ (30) 

w=0 


p — u 
k — u + 1 


for some 7 G (—|a|, 0), by Taylor’s series expansion of (1 — |a|)^ “ around 0 up to A: — tt terms. 
Now, for u > p, 1 < u < t < k/4, we have. 


P ^ \ fc-u+l 
A; — u + 1 




k-u+i < f {k-p)e\a\ \^ ^ 


k — u + I 


< (27)- 


(31) 


since, 1 < u < t < k/4 and |a| < ^ 

k—u (^~*) 

Case U.2: u < p. Consider Yluj^o (k-ui ■ Let the rcth term in the summation be r^, 

\ w ) 

for 0<rc<A; — u — 1. Then, for l<rt;<A: — u — 1, 


Tw+l 


Ip — ti — u;| 
tc + 1 


a 


k — t — w\ ^ 1 
k — u — w ) ~ 50 


since, (a) 1 < u < t and A; — u — rc > 1, and, (b) | 

Therefore, 


k—u , ^ (k—t\ 

ifoV “ A (7") 


< 


E(™)‘ 


W>1 


49 


Combining Cases U.l and U.2, we have, 

|p^ ||a|““ 


Uiir E 


A;“ 


((1 - \a\Y-^ + (27)-(3/4)^) non-integral + 


(32) 


Case V: Proceeding similarly for evaluating 14r) we have, 


Vur = Yl 

v=r 

k 

= E 


k (P)(k-u\ v-r 
\v/ \v—r/ ^ 

F) 

v=r \vJ 

k n-r 

V— \v—r/ \v—r/ 

^ //c-n 

v=r V— \v—r/ 

pLUPUUIo” 

kt ^ 


w=0 


(T) 


Case V.l: r > p. We note that if p is integral then = 0 and therefore Vur = 0. Otherwise, 
sgn((V)) = (-!)"'• Thus, 
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/ /k—r\ 

\ w ) 


w=0 




sE 

k—r 

sEI 

ii;=0 

k—r 

= E 

ti;=0 


//c-r\ 
\ w ) 


p — r 
w 

p — r 
w 


llal*^, since, k > u > r > 1, 

(—|a|)"',for some 7 G (—|q!|, 0 ), 


< (1- la|f-’' + (27)-(3/^)^ 

following the same argument as in Eqn. (|3ip . and using 1 < r < t < A:/4. Thus, 


Kr- < 


Case V.2: r < p. Consider the ratio of the absolute value of the w + 1st term, denoted to 

/k-u\ 

the wih. term Vyj of the summation )k-r\ ■ Then, 

\ w ) 




\p — r — w\\ f k — u — w 
a 


tc + 1 


k — r — w 


<(?)«< 


50 


Therefore, 




and so, 


T 

\Vur\ e^{i± 


kr- 


49 


Combining Cases V.l and V.2 gives 

\Vur\ < ^ (((1 - \a\r-^ + (27)-(3/4)fc) 1 
Substituting Eqn. (I32|) and ([331) in Eqn. ([ST]), we have. 


r>p,p non-integral T ' lr<p 


(33) 




u=l ^ r=l 

< ( ) ^ ( ]/3''\Uu,r\\Vu, 


1 1 

u=l ^ ^ r=l 


(34) 
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Now, since, l<r<?x<i<A:/4, we have. 


1 ^ur 1 * 1 

^ur 1 




\(y.\ 




(^((1 - \a\Y-^ + (27)-(3/4)fc) l.>p,p non-integral + ^ ' l.<p) 


Lr>p,p non-integral “1“ * lr<p 


: ^ (((1 “ |a|)P-“+P-'’ + 2(1 - |a|f-“(27)-(3/^)*^ + (27)-(^-®^)) 1 

+ ((1 - \a\y-^ + (27)-(3/4)^) lr<p<«,p non-integral) + 


r>p,p non-integral 


Therefore, 




< 


u 

u=l 

* -A /w\ / |p^ I ||a| 


1 1 
?x=l ^ ^ r=l 


^^'E E h 


r / \ I 

1 _ |a|)P-“+P-’' + 2(1 - |a|f-“(27)-(3/4)fc + (27)-(l-5'^)) ln>p,p non-integral 

+ ((1 - \a\r-^ + (27)-(3/4)^) (^^) l,<p<,,p„„„_i,tegral 




= Pi + P2 + P3 


□ 


B.4.1 Estimating P 3 

Lemma 27. Assume the premises and notation of Lemma\2^ and Corollary \2A Let y,y' and 
distinct and let tx = 'Ey and P = Eyi be random permutations from [k] —)• [k]. Let a = and 
/3 = ^. Then, 


P 3 < 


0.275p2 

k 


A2p/3 . 


(35) 


31 








Proof. Consider the sum P 3 . 


.u / ' \r J \ kl^ 

u=l ^ ' r=l ^ ' \ 


^ ^ r=l 
min(p. 



U=1 


t\ (fp- 
kii- 





' cn\ 2 min(p,t) 

<|“i A=f ^ i‘E?, la 

U=1 

t-r, X 2 mm{p,t) 

-) y ] ^Y\a 

49; ' 



U=1 

t 


uj \k/ 
t\ fp'' 


< 1^1 

^ \uj \k 

U=1 ^ ' 

2 

A 2 p (P 3 , _ P 32 ) 




\a\ 




(36) 


where, 


Let a = (1 + 



p\a\ 


1 + 


Pl_ 

k\a\ 


- 1 


^31 - Pi2 = A -h^ <{a- h){tA 

<l^)Wexp{(*-l)y 

p‘^/3 (p\a\ p‘^/3 

/111 
^ IT ""Pj 100 + ^1 

^ (1.0102)^2/3 

- Ak 


1 + 


pW\ 

k 


k\a\ 


- 1 


Therefore, 


Therefore, subsituting in Eqn. (l36l) . we have, 

0.275p2 


Pz < 


k 


x^pp = 


0.275p2 

~ 


A 2 p -2 2 


□ 
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B.4.2 Estimating P 2 

We now consider P 2 . 


Lemma 28. Assume the premises and notation of Lemma\^ and Corollary \23[ Let y,y' £Y and 
distinct and let tx = TXy and P = iXyi be random permutations from [k] —)• [k]. Let a = and 


/3 = 2 


,2 

A^- 


P 2 < 


(30)(40)A: 


(37) 


Proof. 


: E 


U=1 


r=l 

t 




k^ 


U=\j)\+1 


bJ 


r=l 


- J • ((1 - + (27)-(3/4)fc^ (38) 


E ^ 


Ip^IIp^II 


A:“ k^ 


The first summation is empty if t < [pj + 1 in which case P 2 = 0. Also, P 2 = 0 if p is integral, 
since = 0, for u > [pj + 1. So we now assume that t > [pj + 1 and p is not integral. 
Further, (1 — \a\)P~'^ > 1, for u > [pJ + 1 and |a| < l/(50p). Hence, (27)“*^^/'^^^ + (1 — \a\)P~'^ < 
(1 — |a|)P““(l + (27)“*'^/^^^. Using this simplification and also using the fact that p^ /k- < {p/kY, 
for 1 < r < [pJ, Eqn. (|38p can be written as follows. 


^>2< (f) (i + {27riw)A2p E 

^ ^ u=[p\+l 



|a|) 


p—u 


< (1.042)(1 


t 1p1 

E E 

«=LpJ+i ^=1 






r 


where, 7 = 
Let 


so that 


Q 2 


t 1p1 

E E 








r 


P 2 < (1.042)e-l"IPA2PQ2 < (1.001)A2 pQ2 • 


(39) 
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Then, 



LpJ 

E 

r=l 

LpJ 

E 


A f pP 



E 

11= LpJ+1 


\a\k 
't\ ( pfi \ ( 


t - r\lf- \{p-r)— 


u-rj 


^7 


,r-\-u—r 



t—r 


. . ,. , . .Y T 

—f V r / I lalfc / \k^J ^ 

—1 u—r=\_p\+l—r 




t - r\ \ {p- r 


u-rJ (k-r)^ 


(40) 


Consider the inner summation in Eqn. (j^O]) . namely, 

t—r 


E 

w=lp\-\-l—r 




t -r\ \[p- r)— 


w J {k-r)^ 


•7 


(41) 


The ratio of {w + l)st term to the reth term, for w = [pj + 1 — r,... , t — r — 1, in the above 
summation is 


t — r — w\ np — r — w\ 
u; + l j \ k — r — w 


7 = 


t — r — w\ (w — {p — r) 


k — r — w 


tp + 1 


7 < — < 


1 


< 


1 


k - (4)(50p-l) “ (4) (49) 


since t < k/A and 7 = < 50 ^^ — 'h' Therefore, Eqn. (1411) may be upper bounded as follows. 


t—r 


E 

-uj=\_p\+l—r 


A 1^ 


t-r\ \ {p-r)- 


w J (k-r)^ 


.7 


< 


t — r 


\{P- 


, LpJ+l-r 


Xp\ + l-rj (/j _ r'jM+lzL 
Substituting in Eqn. ()40]l . we have. 


7 


[pj+l—r 


1 + 


195 


LpJ 




[pj+l—r 


LpJ 

< (1.0052) 

r=l 

LpJ 


t\ / t — r 



p 


ipj+i 


7 L 


(1.0052)^ 


\j)\ + l — rJ \|a|A: 2 y \ (fc — r)iEi±lz!l 

t \flPi+Af Yf\ip-r)^P^ 


^VLpJ + vV r J - \a\)k'^ J ^ (fc _ r-)LpJ+i-^ 


7 


[pj+l-r 


= (1.0052)5 


(42) 
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Consider the summation above and let tr = (i-\a\)k ^ ) '’jLpJ+i-r J 7 ^^^+^ ^ be the rth 

term. Let = argmax|;^^tr, that is is the largest among the trS. Then, clearly, S = tr < 

\.P\tr^- For = r e { 1 , 2 ,... , [pj}, we have, 


S<[p\ 


LpJ +1 


[p\ +1 




(p-r) 


[pj+l-r 


r / \{l-\a\)k^) j ^ 

\f Y({p-r)M+lzL 


[pj+l-r 


f lpjj 

V rl y I A:^(A;-r)M±iz!i / V(1 - l«l)^y 


7 


[pj-r 


Now 


pY 


< 




< 


(l-|a|)A: {l-^)k 


(l.Oliy/3 

k 


since p >2. 

Therefore, Eqn. (j43p may be written as 


(4)-(W«), since, ^ < i t < 

k^rl r k - A ' - 


< 


(50p - 1) 


< 


p‘^j3 


since, p > 2 and p'^P <C 1 . 


(30)(49)A:’ 

Substituting in Eqn. (|^^ . we have that Q 2 < (1.0052)5' and from Eqn. p9]l . we have, 

p^A^P/3 


P 2 < (1.001)A2pQ2 < 


(30)(40)fe 


(43) 


(44) 


□ 


B.4.3 Estimating Pi 

We now calculate Pi. 

Lemma 29. Assume the premises and notation of Lemma\2^ and Corollary \23[ Let y,y' (^Y and 
distinct and let tt = iTy and tt' = iTy/ be random permutations from [k] —>• [A:]. Let a = and 

/3 = ^. Then for n>2, 


a < (0,3) ( 


(2)(25)2p 


(a-1) 


(45) 


Proof. 


a-a-e : E 


U=1 


r=l 






ky^k^ 


((1 - |a|)P-“+P-" + 2(1 - |a|)P-“(27)-(^/^)^ + (27)-(i'^^)) lr>p 


(46) 
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First, we note that for k > clogn (where, c = 100 as per Table [2|), (27)^ ?>/^)k _ ^ (3.5)c ^ 
n-i^-^y{l-\a\y-^. Hence, ((1 - \a\y-^+P-^ + 2(1 - |a|)P-“(27)-(3/4)*: + {27)-^^-^^'^) = (l-la|)2p-“-’'(l+ 
(^(j^-( 3 . 5 c))^ Therefore, 


P. = (l + 0(n-“=))A2>'J^y2y;5 

= (1 + ^ ^ 

'!i=LpJ+l r’=LpJ+l 
= (1 + 0(n-3-5'^)) A^PL 


|p^ ||p^ |l 


a 


ky- kr- 


(\V^\\P^ ||a 



u \r, 


V kiy kn 


(1 - |a|)P-“+P-’'l,>p 
■ ) (1 - |a|) 2 p-“-’ 


(47) 


where, 


E E 

«=LpJ+ 1 r=[pj+l 





J \ k^- ky 

Let a = [pj + 1 and let ?; = u — a and w = r — a. Then, 


(1 - laD^P- 


E(‘r)Eu,"“' 

^ ^ „=0 ^ ■' w=0 


P 


(a + v-p-iy (a + w-p-l)^ 


(1 — |a|)y \ {k — a)-{k — a)—{w + a)- 

Now a — p + w — 1— < w— = w\. Similarly, a — p + v — 1= t>!. Therefore, 
''t — a\ / V 


(1 - |a|)" 



V / \w 


(a + v-p-iyia + w-p-l)^ <{t-ayvii^ . 


Hence, 


L < (1 - \a\yP-^^pH^ ) (1 _ |«|)- 


k^ J ^\(k- ay 

i ;=0 ^ ^ ^ 


E 


V— 


- «)- 
w=0 ' ' 

a \ 2 
pu 


a 


P 


1 


(l-|a|)y \{w + a)- 

t—a V 


where c = 


<(l_|«|)2p-2a/3a^a ^ 1 + ^^ ' ' P'^ ^ ^ T 

\k£Lj \ , ^-1 (k — a)— \{w + a)- 

\ / \ v=l ti;=0 

(k-a){i-\a\) P' = ( (i4a|) )- denote the summand 


k}in — 


-P 


i/W 


1 


{k — a)— \{'w + a)- 

The summation in Eqn. (|49p may be written as 

t—a 


, l<?;<t — a, 0 <rc<u . 


J = where, = E f'vw 5 — 1,2,... CL 


V=1 


w=0 


(48) 


(49) 
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Therefore, 

Comparing and we have, 


^vw 71 \ti; / i \a 

[k — a)— [w + a)- 


^v-\-l,w — 


W / , \CL 

— [w + a)- 


Then, 


{k — a)— {w + a)- 
lv+i,w+i _ c{v + l)\a\P'{w + l) 


Ivw {k — a — w){w + 1 + a) 


, l<t;<f — a — l, 0 <r(;<?; 


Since, Ivo = ^ ^ ; therefore, < c\a\. Therefore, for 1 < r; < t — a — 1 , 

’ 2-^w=o 


or, 


K., 


D+l 


.^+1 , 


E tl+i / 
u ;=0 


2 iir,, 


2 S?j)=0 

■v-U+l 


i?+1,k; ^ / ^w+l.ihJ+l 

< ^-h max ' 


^VQ 


— 0 \ ks 


< 2c|al, by Eqn. ([HU]) . 


< ^=0^^+!’^ < 4c|a| < 


4(f — a) 


K„ 


J 

Z^w=0 


< 


{k - a){l - ^)25p 25p-l 49 


(50) 



< (1 - |a|)2p-2“/3“i^i ( ^ ) (1 ) (1.006) 


< (1.006)(1 - |a|) 


< (0.2625) 


since, (i) c = < 0.256, (ii) (|7) < (|)“ < ( 3 )“, (hi) a = [pj+l and therefore 


( 2 )P 2 

(25p)^ — (25pp 

Substituting in Eqn. dST]) . we have, 


jA < a!, (iv) Pp < 7 ^ < 7 ^ and so, ^ = (p 2 / 3 )M) 


_ 2 .^/ 2 

pj^a ^ \p P) ( 25 pp ) k°- 


< 


Pi < (0.3) ('$ 


A:“ ) V(2)(25)2p 


(a-l) 


□ 
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B.5 Completing Variance calculation for Averaged Taylor Polynomial Estima¬ 
tor 

Lemma 30. Assume the premises of Lemma\^ and let /i = A. Let y,y' &Y be distinct. Then, 

Proof. By Lemma [22l 




fj'VV _ 

Uyy'TTyTTy, 


Taking the ratio of the v + 1st term and the rth term of the summation above, we obtain, 


{p-v)‘^\ ('rf\ ({t-v)\ (u + 1) ^ ^ ({p-l) 


{v + iy) \X^ J V v + ^ J V k — v 


2 \ 1 

< 


(25p)2 J - 2500 


Therefore, 


since, ^ < 3 . 


cov (^) (1 + S 


Lemma 31. Assume the premises of Lemm,a\2S\ Let y,y' €Y be distinct. Then, 

0.276p2A2P/3 


Cov (^dy, dyf) < 


k 


Proof. Case 1: /i = A. By Lemma [30l 


Cov {dy,dy^) < ^ y 2 p 2^2 _ 

Case 2: y, \. Adding the expressions for ^ 3,^2 and Pi respectively from Lemmas 1271 to 
obtain. 


P < f (0.275) + f A. + 


< 


k 

0.276p2A2p/3 

k 


1200 V^“"V V(2)(25)2p 


1 


(a-l)'' 


non-integral 


Therefore, 


Cov (^dy,dyi) < Pyyt + Qyy', by Corollary 

0.276p2A2p^ 


< 


, by Eqn. (I5ip and Lemma [25] 


□ 


we 


( 51 ) 
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Thus, in all cases, 


Cov {i!}y, -dyi^ 


0.276p2A2p/3 

k 


□ 


Lemma 32. Assume the premises of Lemma and let k > 1000 and n > 2. Then, 

' (0-288^ ^ „2p-2 2 


Proof. 


Var ?9 < 


/X - T] 


Var = 


< 


Y 


^Var[i9y]+ Qoy[dy,dy,) 


y&Y 


y¥^y' 

y,y'&Y 


1 


|y|(i, 08 )pV"’'-V + 


|y| 2 ) ^ |y |2 ^ ^ 


^2p-2^2 ^ > 1000^ 

The second step uses Corollary [3] and EH 


(52) 

□ 


C Proof that ^ holds with very high probability 

C.l Preliminaries and Anxiliary Events 

The event GOODF 2 . Using standard algorithms for estimating F 2 such as mm, one can obtain 
an estimate F 2 satisfying IF 2 — L 2 I < 2 ^ 1^25 with probability 1 — n“25 using space 0(log^ n) bits. 

Then, F 2 = ^1 — ^2 satisfies F 2 < F 2 < -^ 2 , which is the event GOODF 2 . 

The event GOODEST essentially states that the CountSketch guarantees for accuracy of estima¬ 
tion holds for all items and at all levels. 

Lemma 33. GOODEST holds with probability 1 — 

Proof. By guarantees of CountSketch structure m using tables with 16Q buckets and s = 8k = 
(8) (1000)(log n) tables with independent hash functions, we have, \fii — fi\ < (F|®®(C'/,^) 
with probability 1 — n“25_ Using union bound to add the error probability over the levels L = 
O(logn) and i £ [n], we obtain that GOODEST holds except with probability n“^^(L)(n) < 

□ 

The above events comprising Q will be shown to hold with probability 1 — . In order to 

do so, we define a few auxiliary events. 
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Auxiliary Events 

For I G {0} U [L] and q > 1, define the random variable 

Hiq = ^ yu and Uiq = ^ /f • yu 

l<rank(i)<2^g rank(i)>2^g' 

where, for i G [n], yu is an indicator variable that is 1 if i G 5; and is 0 otherwise. For I G {0} U [L], 
dehne two auxiliary events parameterized by a parameter q, as follows. 

SMALL-H(Z,g) = Hiq < 2q, and 

SMALL-u(/,g) = Ui^q < - . 


C.2 Proof that space parameter Ci is polynomial sized 

We will now show that C; = for each I G {0} U [L], This would also imply that Bi = 

Ci{27p)~'^ = for eacW G {0} U [L], 

Lemma 34. Assume the parameter values given in Figure\^ Then forp > 2, Cl > n^d). 


Proof. Since L = |■log 2 a(n/C')], 


r ^ (4Q;)(2a)^°S2c,WC')C' 

Cl = 4:0^C > ^ \ - 

“ ^ ^ 2 *°g 2 a(’^/C) 

4an 4an 

(2l°g2(«/C'))l/(log2(2«)) (^^^(jy/log2{2a) 


Let a = 1 — 7 . Then, 


= 1 + lo&(«) = ' + ^ S - ill) 


since, 7 < 1/2. Hence, 


1 1 47 

log 2 ( 2 a) (1 - 27 /ln( 2 )) “ ^ ln 2 


Let C = Kn^ Substituting in ([53]), 


^ ~ (n/C')Vlog 2 ( 2 a) 

4an 

- (^n/C!)l+4y/ln(2) 

= 4aC(n/C7)-^'^/^'^(2) 

= 4aAn^-2/P • (K-in2/P)-47/in(2) 

= 4q;A • A' • 


(53) 


(54) 


where, K' = 
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Since, a = 1 — (1 — 2/p)v, 7 = ! — a = (l — 2/p)z/. The exponent of n in (ISl]) is 
1-2/p - (2/p)(47/ln(2) = 1 - 2/p - (2/p) (1 - 2/p) 

= (l-2/rt(l-(2/p)(|5)) 

which is a positive constant for all p > 2 and v < (ln2)/4. Thus, Cl = □ 

Remark. This is the only place where the fact p > 2 is explicitly used. If p = 2, then. Cl 
would be 0(e“^), and L would be log2(ne^) + 0(1). The analysis would work, although the space 
bound would increase by a factor of 0(log(ne^)). 

C.3 Application of Chernoff-Hoeffding bounds for Limited Independence 

We will use the following version of Chernoff-Hoeffding bounds for limited independence, specifi¬ 
cally, Theorem 2.5 (II a) from pH] . 

Theorem 35 ([28]). Let Xi,X 2 , ■ ■ ■ ,Xn be d-wise independent random variables with support 
in [0,1]. Let X = Xi with IE [A] = p. Then, for (5 > 1 and d < \6pe~^^^~\, 

Pr[|A-p| > 6p] < 

The following lemma is shown whose proof is given later in this section. 

Lemma 36. Suppose d < lqe~^/^\. Then, for I G {0} U [L] the following hold, 

1) Pr [SMALL-H(f, g)] > l-e“l-'^/^J, and, 

2) either Uiq = 0 or Pr [small-u(Z, g)] > 1 — . 

Lemma (Ml shows that Cl = This implies that Bl = = n^(i) since e = l/(27p). 

Therefore, Ci > Bi > Bl = for all I G {0} U [Lj. Hence we can use Lemma [361 and the union 
bound over I G {0} U [L] to show that the following events hold with probability 1 — = 

1 — > 1 — for suitable choice of the constant. 

(a) A;g{o}u[L] small-h(Z,(7/), ( 6) A;g{o}u[L] small-h(R, C;/ 2), and 

(c) Ai6{0}u[L] small-h(Z, \a^Bi/{l - 2€f] 

We now prove Lemma [36l 

Proof of Lemma\S(k For any fixed I, yu is an indicator variable that is 1 iff gi{i) = g 2 {i) = ■ ■ ■ = 
gi{i) = 1. Since the p/s are drawn independently from d-wise independent hash family, the j/j/s 
are d-wise independent. 

By definition, Hiq = Yli<rank{i)< 2 ‘qyii number of items with rank 2^q or less that have 

hashed to level 1. Since, Pr [yu] = 1/2^, we have, E [Hiq] = 2^q ■ ^ = q- Therefore, 

Pr [Hiq > 2q] < Pr [\Hiq - q\ > q] < 
by using Theorem 1351 and assuming d < qe~^^^. 
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We now prove the bound on Uiq. By definition, Uiq = J2ra.nk{i)>2‘q fiUii- Taking expectation, 
E [Uiq] = Erank(.)>2'. ff 

Since, |/i.ank(2'q)l < l/rankO)l for each j e {2^-^q + it follows that - 

/{2^-^q). 

Case 1: Suppose {2^~^q) > 0. Define a scaled down variable as follows. 


U'lq = 


E 

rank(2)>2^g 


ff {2^-\)Ulq 


By the above argument, the multiplier /j^/(Ff® (2^“^g) /{2^~^q)) < 1. Since yu are indicator 
variables, 17/^ is the sum of d-wise independent variables with support in the interval [0,1]. 
Taking expectation, 

rr,.,i (2'-‘g)E|t/„] (2'-!.,) FrFA<i 

L Id Ff‘(2'-i5) Ff'p'"'?) 2' “2 ' 

By Theorem 1351 we obtain, 


Pr[F/,>IE[C//,]+g] <Pr[|F/,-E[F/,]|>(?] < . 


provided, d < [ge which is assumed. 


g^res(2i-lf 


The event > E[F/g] +q may be equivalently written (by rescaling) as Uiq > E[F/q] + ^ ^ 


which is the same as Uiq > 
1.5F|®=(2*-lg) 

Therefore, 


^(2*g) 

~2^ 


3^2^— 

H— ^ —2, This in turn is implied by the event Uiq > 


Pr 


Uiq > 


l.SFI®" 


(2'-ig) 


- 2'-i 

Fuse 2: F^^^ {2^~^q) = 0. Then, Uiq = 0, 


<Pr[U[q>lK[U'iq]+q] < 


□ 


Lemma 37. V/ G {0} U [L], small-h(/, Q), SMALL-h(/, |'F//(1 - 2e)^]) and SMALL-u(/, Q) hold 
simultaneously with probability 1 — 0(n“^®). 


Proof. From Lemma[36l SMALL-h(Z, C/) and small-u(/, Q) each holds with probability 
g-mm([d/2j ,Cie 1/3/2)^ Similarly, SMALL-h(/, \Bi/{l — 2 e)^]) holds with probability 

g-min(Ld/2j,(S,/(l-2e-)2)e-l/3/2)^ 

From Lemma [Ml we have. Cl > n^U)^ and hence, Ci > Cl > n^T) for each I G {0} U [L], 
Hence, d = O(logn) = o{Cl) = 0{Ci) for each 1. The failure probability is therefore since 

e = l/( 27 p), Bi = e'^Ci = n^T) and therefore, d = o{Bi), for each 1. 

Taking union bounds over the O(logn) values of I, the three events hold simultaneously except 
with probability {L + l)(3)e“'^/^ < (L + = o(n“^'^). 

□ 
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C.4 Proof that SMALLRES, ACCUEST, GOODL, smaller hold with very high prob¬ 
ability 

Lemma 38. Let L = |'log 2 Q(re/C)] and the hash functions gi, g 2 , ■ ■ ■, ql are drawn from d-wise 
independent family with d = O(logn) and even. Suppose small-h(Z, C;) and SMALL-u(Z, C/) holds 
for eaeh I £ {0} U [L]. Then, smallres holds. 

Proof. We first show that smallreS/ = (2C';, 1) < [{2ayC) is implied by small-h 

{I, Cl) and small-u(Z, C/). 

If small-h(Z,Cz) holds, then, Hi^Ci ^ that is, I]i<i.ank(j)< 2 'C'; Vn - Hence, 

1 (nl-l 

Fr m,i) < Y. ^ 

rank(^)>2^Ci 



where the last inequality follows since small-u(Z, Q) holds. 

Further, 2^~^Ci = 2^“^(4a^C') > 2(2a)^C', since, 0 < a < 1. Thus, 

Hence smallres^ holds, for each I £ {0} U [L], or equivalently, smallres holds. □ 

Lemma 39. GOODEST A smallres imply ACCUEST. 

Proof. Fix i £ [n] and I £ {0} U [L]. By construction, Ci = Aa^C. Thus, 

, f . ,2 / Fr {Cl, l) ^ 1.5Fr (2(2a)'C) ^ F^ ((2«)^C) 

- Cl - 2'-i(WC) - 2(2a)^C 

where the first step follows from GOODEST and the second step follows from smallres. □ 

We now show that the H structure discovers all items and their exact frequencies that map 
to level L (with high probability). 

Lemma 40. For L = |'log 2 c^] and assuming small-h{L, Cl) and GOODESTl holds, the frequen- 
eies of all the items in Sl are discovered without error using HHl. That is, small-h(L, Cl) A 
GOODESTl implies goodfinallevel. 

Proof. Let L = |'log2c (n/C)]. Then, 

2^(Cl/2) = 2^(4a^C/2) = 2{2a)^C > 2{njC)C = 2n . 


By dehnition, Hl^Cl /2 = X]i<rank(i)< 2 ^(Ci/ 2 ) 2/*^ counts the number of items that map to level L 
with ranks in 1,2,..., 2 ^(Cl/ 2). But 2 ^(Cl/ 2) > n. Hence, Fl, ( 7^/2 is the number of items that 
map to level 1. Since, SMALL-h(L, Cl/ 2) holds, HLfi ^/2 < Cl. Hence, {Cl,L) = 0. By 

GOODESTl, |/*l - h\ < {F^ {Cl, L) /Cl)^^^ = 0 . Thus if i £ 5 l then /^l = U □ 
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Remark 1. Lemma HQ] can be proved as an implication of the event SMALL-h(^, C^) by using 
an ^2/^1-compressed sensing recovery procedure as in I14|. 

Remark 2. In the turnstile streaming model assumed, we say that i appears in the stream 
iff \h\ > 1. By Lemma 1401 the frequencies of all items are discovered exactly. Hence items with 
non-zero frequencies, that is, those with \fi\ > 1 would satisfy {ful = \fi\ > 1/2 = Ql and thus 
would qualify the criterion of being discovered at level L. All other items would satisfy \fiL\ = 0 
and will not be discovered at level L. 

At each level I, the algorithm finds the top-Q items by absolute values of estimated frequencies. 
A heavy-hitter at a level I is however defined as an item whose estimated frequency crosses the 
threshold Qi. The event smallhHj states that the heavy-hitters at a level I are always among the 
top-Ci items by absolute estimated frequencies. 

Lemma 41. Suppose small-h(/, \Bi/{l — 2e)^]) holds for eaeh I € {0} U [L — 1] and suppose 
ACCUEST holds. Then, smallhh holds. 


Proof. Let H'l denote the set of items that are discovered as heavy-hitters at level I, that is, 
HI = {i e 5, 1 1// > Ql}, where, Qi = Ti{l — e)} . By ACCUEST and since e = {B , we obtain 


\fii - fi\ < 


/ {i2ayC) 
2{2ayC 




1/2 


Suppose i G H'l. Then, 


\fi\ > Ql 


e ( 

y/2 \ {2ayB 


1/2 

> Ti{l - e) - Ti{e/V2) >Ti{l- 2e) 


since, Ti = (T2/((2a)'B))V2 > (F2/((2a)'B))V2. 
Therefore, 


rank(i) < 



F 2 

{Ti{l-2e)f 


F2{2ayB ^ 2^Bi 

F2(l-2e)2 “ (l-2e)2 


Hence H'l C Hiq, where we let q = Bi/{1 — 2e)^. 

Since SMALL-H(/,g) holds, Hiq < 2q. Further, since, H[ C Hiq, therefore, \H[\ < 2q = 2Bi/{l — 
2e)^ < Cl, since, by choice of parameters, e = {BijCi)^/'^ = l/{27p) and p > 1. 

By construction, H[ is the set of items whose estimated frequencies are at least Qi. Hence, 


H'l = f^{\H'i\) C foPK(( 7 /) . 


□ 


C.5 Proof that NOCOLLISION holds with very high probability 

Lemma 42. If t > 6 and s = 0(logn), then, NOCOLL holds with probability at least 1 — n 
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Proof. Assume full independence of hash functions. For i G ToPKi(Ci) and I G [2s], let Wiji = 1 
if i collides with some other item in ToPK;(C'i) in the jth table of the tpest structure at level 1. 
Since, each table at level I G {0} U [L — 1] has 16(7/ buckets, therefore, 

/ I \Ci-l 

q = Pr [wiji = 1] = 1 - 1^1 - < 1/16 . 

Let Wii = — Wiji) be the number of tables where i does not collide with any other item of 

Topk/(( 7/). Then, E[VFj/] > (1 — <?)(2s) > (15/8)s. By Chernoff’s bounds, 

Pr [Wii > s] > 1 - exp {-(15/8 )s(7/15)V2} > 1 - 

= 1 _ g-(0.2)(8)(100) log(n) ^ ^ _ ^-160 


since, s = 8k = 8(100log(n)). 
By union bound, 


Pr 


Vi G Topk/((7z) {Wii > s) 


> 1 


(7/e-°-2* > 1 - . 


Assuming t-wise independence of the hash family from which the /i/j’s are drawn, denote q[ = 
P'^t\wiji = l], where the subscript t denotes t-wise independence. Let Uikji = 1 if i and k collide 
under hash function hij for the jth hash table in the structure tpest/. Let Su = Topk(( 7;) \ {i}. 
Then, by inclusion-exclusion. 


I- q = Pr[wiji = O] = 1 - Pr[wiji = l] = 1 - Pr 


\J — 1) 


kGSii 




— 1 ^ 1 ) ^ ^ P^l^ikijl — ^■)'^ik2jl — 1 , ■ ■ ■ ■fUikrjl — 

p 1 {ki,k2,---,kr}czSii 

1 - = Pl [wiji = 0] 

\Sii\ 

f ^ f) ^ ^ Pp 1) '^ik2jl f 5 • • • 5 ^ikrjl l] (b6) 

r=l {ki,k2,-,kr}cSu 


Further, the sum of the tail starting from position t -|- 1 to |S';j| is, in absolute value, dominated by 
the tth term. Therefore, from (|55p . we have. 


t-i 

I Q ^ f) ^ ^ P^1) '^ik2jl !)•••) ^ikrjl l] | 

r=l {ki,k2,...,kr}cSii 

^ ^ ^ P'^lfkikijl — — 1, . . . jUik^jl — ij (57) 

{ki,k2,...,kt}(^Sii 

Similarly from (1561) . we have, 
t-i 

I? ^ ^ ( 1) ^ ^ PL — 1) Uik2jl — f) • • • ) '^ikrjl — f] I 

r=l {ki,k2,—,kr}(lSii 

— ^ ^ PL — 1) Uik2jl — f) • • • ) '^ikrjl — f] (58) 

{ki,k2,...,kt}GSii 
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By t-wise independence, the probability terms in the above expression are identical for r = 1,... ,t, 
that is, for any 1 < ki < k 2 < ... < kr < n and 2 < r < t. 

— 1) '^ik2jl — f) ■ ■ ■ j '^ikrjl — f] 

— — 1) Uil^2jl — 1) ■ ■ ■ ) klikj-jl — l] 

Therefore, by triangle inequality, 

I? Q\ ^2 'y ^ Pr^ [Uikijl — l)7tjfc2jZ — ^lUikijl — 1] (^9) 

jl<j2<...<jt 


Since there are 16Ci buckets in the tpest structure at level I, we have, Pr[ujfc^j7 = 1 ] = 1/(16Q). 
Substituting in ([591) . 


Qt\ — 2 ^ ^ Pl^t — IjTXjfcjjZ — ^I'^iktjl — f] 

jl<j2<...<jt 


= 2 


\Su 

t 


1 


IQCi 


< 2 


(Cl - 1) 


(16Q)-‘ < 2 


( Qe 

\lQCit 


< 2 


(—) 

\mJ 


since, \Sii\ = Ci — 1. For t > 6, \q — < 2(32)“® < 2“^®. 

The above Chernoff’s bound argument may be repeated using probability of success 1 — q[ > 
1 — (7 —2“^®, instead of 1 — < 7 . Hence, NOCOLL(Ff) holds except with probability by calculations 

similar to the previous one. □ 


C.6 Proof that Q holds with very high probability 
Restated Lemma (Restatement of Lemma[7I). Pr[^] > 1 — 0(n“^^). 

Proof. By adding the failure probabilities of all the events comprising Q using Lemmas [33] through 
1571 the statement of the lemma follows. □ 


C.7 Technical fact 


The following fact gives a bound on the difference between the unconditional probability of an event 
E and its probability conditioned on an event F. It essentially shows that if Pr[F^] = its 

probability is not significantly altered if it is conditioned by a very high probability event F, that 
, Pr[F] = 1 


IS 


Fact 43. Let E and F be a pair of events such that Pr[F] > 0. Then, iPr]^^ | F] — Pr[F]| < 
1 - Pr[F]. 


Proof of Fact\4^ If Pr[F] = 1, then Pr[F,F] = Pr [F U F] - Pr [F]- Pr [F] = l-Pr[F]-l = Pr[F], 


and hence the statement holds. Otherwise, 


Pr [F] = Pr [F I F] Pr [F] + Pr [F | ^F] Pr [^F] 

Subtracting Pr [F | F] from both sides yields, 

Pr [F] - Pr [F I F] = Pr [F | F] (Pr [F] - 1) + Pr [F | ^F] Pr [^F] 
= (-Pr [F I F] + Pr [F | ^F]) Pr [^F] 
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Taking absolute values and noting that | —Pr[£^ | F]+Pr[^^ | -'-F]| < 1, we have, | Pr [£^] —Pr | F]\ < 

PrhF], 

□ 

The fact is used by letting F = Q. Then, for any event E, |Pr[i? | ^] — Pr[£^]| < Pr[-iG] = 
by LemmalTl 


D Basic Sampling Properties of Geometric-Hss Algorithm 


Preliminaries. The following lemma argues that the frequency ranges dehning Imargin, mid and 
rmargin are non-empty intervals. 

Lemma 44. For p>2 and for each I G {0} U [L], the frequency ranges that define lmargin{Gi), mid 
(Gi) and rmargin{Gi) are non-empty intervals. 

Proof. The statement of the lemma is obviously true from the definitions for lmargin(G;) and 
rmargin(G;). 

For mid(G/), the interval range is [Ti{l + e), T;_i(l — 2e)). This range is non-empty iff T/_i(l — 
2e) > r,(l + e), or, > (1 + e)/(l - 2e), or, (2a)V2 > (i + l/(27p))/(l - 2/{27p)), which is 

true for a = 1 — 2(0.01)/p > 0.99. □ 


Our analysis is conditioned on Q. Assuming Q holds, the event ACCUEST holds, and therefore, 
the frequency estimation error by the HH; structure is bounded as follows. 


\fii - fi\ < 


/Fp {{2ayG) 

(2a)'C 




(60) 


We first prove a property about the relation between the level at which an item is discovered 
and the group Gi to which an item belongs. This property is then used to a relation between the 
probabilities with which an item may belong to different sampled groups. 


D.l Properties concerning levels at which an item is discovered 
Lemma 45. The following properties hold conditional on Q. 


1) Suppose i G lmargin{Gi) for some 0 < / < L — 1. Then, (a) Pr [ld{i) < I — ^ \ G] = 0, and (b) 
the event {ld{i) = l,G} ^ {i & Si,G}. 

2) Suppose i G mid{Gi) for some 0 < I < L. Then, (a) Pr[ld{i) < / — 1 | 0] = 0, (b) the event 


{ld{i) = l,Q} = {i e Si,g}, and, (c) Pr fii>Ti\ie Si,Q 


= 1 . 


3) Suppose i G rmargin{Gi) for some 2 < I < L. Then, (a) P\r[ld{i) <l — 2\ g]=0, (bj {i £ 
Si,g} implies{\fii\ > TJ , and (c) Pr[ld{i) = I \ ld{i) 7^ I - 1,^] = Pr[i G 5; | ld{i) / I - l,Q]. 


Proof of Lemma\f^ Since, ACCUEST holds as a sub-event of G, we have, Ifu — fi\ < eTi, by Eqn. 
(160|) . Also, Qi = Ti{l — e). All statements below are conditional on G. 
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Case: i G lmargin{Gi) U mid{Gi), I >1. Then, Ti + eTi < \ fi\ < T;_i — 2eTi-i. Therefore for 
r < / — 1, 


\fir\ ^ |/i| + cTr < Tl—1 ~ 2eT;_i + eTr ^Tr — 2eTr + eTr — Tr — eTr — Qr ■ 

Hence, Pr [ld{i) < / — 1 | 0] = 0. 

Further, if i G 5/ and i G lmargin(G/) U mid(Gi), then, \ fii\ > \ fi\ — eTi >Ti — eTi = Qi and so 
i is discovered at level I, if i has not been discovered at an earlier level. However, part(a) states 
that i cannot be discovered at levels / — 1 or less. Hence i is discovered at level 1. Thus, conditional 
upon if i G Si, then, ld{i) = I- Conversely, if i ^ Si, then ld{i) / L Hence, the events {i G Si} 
and {ld{i) = 1} are equivalent, conditional on Q. This proves parts 1(b) and 2(b). 

Case: i G mid{Gi). If i G Si, then, \ fii\ > |/j| — eTi > Ti + eT; — eTi = Ti. This proves part 2(c). 
Case: i G rmargin{Gi). Then, |/i| < ?]_i. Let r < I — 2. Then, 

\fir\ < \fi\ + ^Tr < Tl-1 + < Tj- — eTr = Qr 

where, the last inequality Ti-i+eTr < Tr — CTr follows since, it is equivalent to < (1 —2e), which 
holds since, < (0.72) and (1 — 2e) = 1 — ^ > 0.96. Hence Pr [Idii) < / — 2 | G] = 0. 

We are given that i G rniargin(Gz). Suppose that i G Si. Then, 

\fu\ > Ti_i - 2m-i - m = Ti{2af/^ - 2{2af/^m - m 

> Ti (1.40(1 - (2)(0.04)) - (0.04)) = 1.248rz > T/ (61) 


Hence, Pr |/j/| >Ti\iGSi,G 
1. This proves part 2(b). 

Since, ld{i) > I implies i G Si, we have. 


= 1, and therefore, by Eqn. ([6T]) Pr[ld{i) G {/ — 1,/} | i G Si,G] = 


Pr[ld{i)>l\ G] = PrMi)>l,iGSi I G] 

= Pr [ld{i) > I \ i G Si,G] ■ Pr[i G Si \ G] 

< (1 - Pr[ld{i) G{l-l,l}\iG Si,G]) • Pr [i G 5^1 G] 
= 0 . 


Hence, 


Pr Mi) / / - 1 I g] = Pr M^ <l-2\G] + Pr M^) = l\G] + Pr M^ > I \ G] 
= 0 + Pr [ld{i) = I \ G\ ~\~ ■ 


It follows that. 


PQldii) 




Pr[ld(i) = l\G] 
Pr M^ ^l-^\G] 


by Eqn. ([62]) . 


(62) 


□ 
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D.2 Probability of items belonging to sampled groups 
Restated Lemma (Re-statement of Lemma [8l). Let i € Gi. 

1) Suppose i e mid{Gi). Then, (a) the event {i £ Gi,Q} = {i G Si,G}, (b) 2*Pr \i G Gi \ Q\ = 

1 it 2^n~^, and, (c) Pr [i £ | =0. 

2) Suppose i G lniargin{Gi). Then, (a) Pr [z £ =0, (b) the event {i G Gi U 

Gi+i,g} = {iG Si,g}, and (c) 2^+iPr [i G Gi+i \ g] ’+ 2'Pr [iGGi\g]=l± 2^n-T 

3) Suppose i G rmargin{Gi). Then, (a) Pr [z £ =0, (b) the events {z £ Gi^i U 

Gi} C {z £ Si}, (c) {z £ 7^ / — 1} C {z £ Gi} , and, (d) 2^Pr [z £ G; | ^] -t 

2^-1 Pr [z £ Gi_i \g]=l± 0{2^n-^) . 


Proof of Lemma\^ Assume g holds for the arguments in this proof. Suppose i G Si. Then {fn — 
fi\ < Tie. 

Case: i G mid{Gi). Part 1 (b). Since z £ mid(Gi), \ fi\ > Ti -teT;. Conditional on g, ACCUEST 
holds, and therefore, 

\fii\ > \fi\ - m >Ti + eTi -PTi = Ti . 

Therefore, 

{iGSi,g} ci{\fii\>Tug} (63) 

Then, 


{iGGi,g} = {Ui) = i,\fii\>Ti,g} 

= {z £ Si, \fii\ > Ti,g}, since, {/^(z) = /,^} = {z £ Si,g}, LemmaHSl (2b) 
= {iGSi,g}, by Eqn. dM]). 


This proves part 1 (b). 

Part 1 (a). 

Pr [iGGi\g]= Pr [ld{i) = I, \fu\ > TH e] + Pr [/^(z) = l-l,Qi< \fi,i-i\ <Ti,Ki = l\ g] (64) 

Denote by £i the event /^(z) = l — < \fi^i-i\ < Ti and by £2 the event Qi < |/i,z_i| < Ti. Then, 

Pr[IEi,ir, = l\g] = Pr[K, = 1 | lEi,^] • Pr [IE2 | ld{£) = l-l,g] • Pr [/^(z) = Z - 1 | 0] = 0 
since, Pe[ld{i) = / — 1 | ^] = 0, by Lemma H5l part (2a). Substituting in Eqn. (fM)l . we have, 

Pr[iGGi\g]=Pr [ld{i) = U\fii\>Ti\g 

= Pr iGSi,\fii\>Ti\g , since, {Zrf(z) == {z £ 5;,^}, LemmaESl (2b) 

= Pr [z £ 5; I 0] , by part 1 (a) 

= 2~^ ± n~^, by Eact |43l 

Multiplying by 2^ and transposing, we have 2*Pr[z £ G; | £ 1 it n~‘^ -2^, as claimed in part 1(a). 
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Part 1(c). We have by ACCUEST that for any 0 < r < Z — 1, 

|/i!,r| < Ti_i — 2eTi_i + ePr < T;_i (1 — e) = Qi-i 

Hence, i cannot be in Gr for any r < Z — 1. We have by part (la) that {i G Gi,Q} = {i G Si,Q}. 

Let i & Gr for some r > Z + 1. Since, for i to belong to Gr, i must be in 5^-1 and hence by 
the sub-sampling procedure, i G Si. By part 1(a), {i G Gi,G} = {Z G Si,G}, and therefore, i G Gi. 
Hence, i ^ Gr, for any r > Z -|- 1. Thus, 

Pr [i G Ur^lGr \G] =0 . 

Case: i G lmargin{Gi). From Lemma 051 ld{i) it I and ld{i) = Z iff Z G 5;. Since Zrf(Z) it I, 
i 0 Gr, for any r < 1. Consider r > Z -|- 1. If Z G Gr, then, Zrf(Z) > r — 1 > Z -|- 1. Since, Z G 
and > Z -|- 1, it follows that Z G Si, by the sub-sampling procedure. However, by Lemma 05l 
part (lb), {ld{i) = Z,^} = {Z G Si,G}. Hence, in this case, Zd(Z) = Z, contradicting the implication 
that Zrf(Z) > Z -|- 1. Thus, 


Pr [Z G = 0 


proving part 2 (a). 

Suppose i G Si. Then, Zrf(Z) = Z and fi = fu. By construction. 


Further, 


Pr [Z G G; I Z G = Pr |/ii| > T/| Z G 5^,^ = pu (say.) 


Pr [Z G Gi+i \ i G Si,G\ = Pr Qi < \ fii\ < Ti, Ki = 1 \ i G Si,Q 


(65) 


-h Pr 


\fil\ < Qi,i G 5i+i, \ fi,i+i\ > T;+i I Z G 5/, ^ 


( 66 ) 


However, conditional on Q and Z G 5;, by Lemma 05l \fii\ > Qi. Hence, the second probability in 
the RHS of Eqn. (IMl) is 0. Therefore, 


Pr [Z G Gi^i I Z G iS;,^] 

= Pr Qi < \fii\ < Ti,Ki = 1 \ i G Si,G 
= Pr Ki = 1 \ Qi < \fii\ <Ti,i G Si,Q • Pr Qi < \ fii\ <Ti\i G Si,Q 

= {1/2) {I-pii) 


(67) 


since, (a) Ki is independent of all other random bits, and, (b) Pr[Qi ^ \ fil\ < Ti \ i G Si,G] + 


Pr 


\fil\ >Ti\iGSi,G 


= Pr 


\fil\ > Qi \ i G Si,G 

and (|67)l . we have, 


= 1 . 


Eliminating pn using m 

2Pr [Z G GiJri I Z G Si, Pr [Z G G/ I Z G Si, ^] = 1 


( 68 ) 
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Multiplying Eqn. (l68|) by Pr [i ^ Si\ Q], we have, 

2Pr [i G G 5; | ^] + Pr [i G Gi^i G 5; | ^] = Pr [i G 5; | ^] . (69) 

By Lemma 051 if z G Imargin(Gi), then, ld{i) ^ I and ld{i) = I (or, \fii\ > Qi) iff z G Si. By 
construction therefore, (z G G; or z G Gi+i) iff z G 5^. This proves part 2(b). 

Thus, z G Gi+i implies i & Si and i G Gi also implies that i G Gi. Hence, Eqn. (|69ll can be 
written as 

2Pr [z G Gi+i \g]+Pr[iGGi\g]= Pr [z G 5, | ^1] = 2"' ± n"" (70) 

using Eact (1431) . Multiplying by 2^ gives part 2(c) of the lemma. 

Case: i G rmargin{Gi). Assume that g holds. By Lemma l4^ /^(z) G {/ — 1,/} but ld{i) itl — 1 
and ld{i) ^ / + 1. Since, ld{i) yl / — 1, it follows that i ^ G^ for any r <l — If z G 5;, we have, 

\fu\ >\fi\ - m > Ti_i - 2m-i - m = Ti ((2a)^/2 _ g(2(2a)i/2 + 


by the choice of parameters a and e = l/(27p). Hence, if z 0 G/_i and i G Si, then, i G Gi. In 
other words, 

Pr [z G G; I Z 0 G;_ 1 ,Z G 5 ;,^] = I . 

If z G Gr for some r > I + 1, then, i G Si and this implies that i G Gi, which is a contradiction. 
Hence, 

Pr [z G I ^] = 0 . 

By construction, we have, 


Pr [i G G,_i 
Pr[z G Gi 


z G Si_i,g] = Pr 

> 7]_1 z G Si_i,g 

= Pi,1-1 (say) 


z G Si-I,g] = Pr 

Qi-1 < < 7]-i and ATj = 1 | z G 5/_i 


+ Pr 

< Ql-iA G Si, \ fii\ > T/ z G Si-i,g 



= A + B 


(71) 


(72) 


where, we let A and B denote the probability expressions in the first and second terms in the RHS 
respectively of Eqn. ()72p . Then, 


A = Pr 


= Pr 


Ql-i < < Ti-i,Ki = 1 I z G Si-i,g 

Ki = l\ Qi-i < \ fi,i-i\ < Ti_i,i G Si_i,g 


■ Pr 


Qi-i < < 71-1 


z G Si-i,g 


— (l/2)Pr Qi-i < \ < Ti_i I z G Si_i,g 


(73) 


Therefore, for z G rmargin(G;), z could possibly be a member of Gi-i which can happen only 
if z G Si-i. However, if z 0 G^-i and z G Si-i, then z can possibly be a member of Gi. This can 
happen in two ways, either (i) Qi-i < < Ti_i and the coin toss Ki = 1, or, (ii) Qi-i > \fi^i-i\ 

and i G Si and |/j/| > Ti. In the latter case, if z G 5^, then, \fii\ is at least Ti with probability 1, 
conditional on g. This follows from Lemma 0^ part (2). In particular, z ^ G;/ for any V ^ {I — 1, /}. 
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Hence 


B = Pr 


= Pr 
= Pr 
= Pr 


< Qi-hi S Si, \ fii\ > T; I i e Si_i,Q 

G 5 ; I i G Si-i,Q 

\fi,i-i\ < Qi-i \ i ^ Si,G • Pr [i G I i G Si-i,G] 

< Qi-i \ i ^ Si,G •(l/2±n“'^) 


Note that Pr[\fi^i_i\ < Qi_i | z e 5^] = Pr[\fi^i- i| < Qi-i I i G Si-i\ for the following reason, 
is a function of the frequencies of the items that conflict with i in the set of hash buckets to 
which i maps in the HH;_i structure. By construction of the hash function, whether i maps to the 
next level I depends on whether gi(i) = 1, which is independent of the hash functions gi,g2, ■ ■ ■, gi-i- 
Hence, 

Pr[|/i,«-i| < Qi-i I i G 5/] = Pr[|/i,i_i| < Qi_i \ i G Si_i] 

Using Fact (|l3|), we have. 


Pr 


< Qi-i \ i & Si,G 


= Pr 


< Ql-1 I i G Si-i,G 


zb n 


Thus Eqn. ()72l) may be written as 


Pr [z G G; I i G 5;_i, G^ — A B 

= (l/2)Pr[Qz_i < < Tz_i I i G Si-uG] + (l/2)Pr[|/i,i_i| < Qi_i \ i G Si.^G] ± 0(n-'^) 

= (1/2)Pr [II < Ti_, I i G Si_^,G] ± 0{n-^) 

= ± 0{n-<^) . (74) 

From Eqns. ()7ip and ()74l) we obtain, 

2Pr[zGGi \ ieSi-i,G] +Pr[iGGi_i \ ieSi-i,G] =l±0(n-") . (75) 

Multiplying Eqn. ([7^ by Pr [z G 5/_i | ^], we have, 

2Pr[zGGi,zG5/_i |g] +Pr[zGGz_i,zG5/_i |g] = Pr [z G 5/_i | 0] (l ± 0(n-‘=)) . (76) 

Erom the discussion after Eqn. (1731) . it follows that z may belong to G;_i U Gi, and in either case, 
this is possible only if z G Si-i. This proves part 3 (b). 

Thus, i G Gi ov i £ G/_i implies that z G 5/_i. Hence, Eqn. d76]l is equivalent to 

2Pr [iGGi\G]+ Pr [z G Gi_i \ G] = ± n"") (l ± 0(n-")) 

= ± 0{n-^) 


Multiplying by 2^ ^ gives statement 3 (c) of the lemma. 

□ 
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E Approximate pair-wise independence of the sampling 

In this section, we prove an approximate pair-wise independence property of the sampling technique. 
Lemma 46. Let i / j. Then, Pr [i € 5; | j £ Q] = 2“^ ± n~‘^. 

Proof. By pair-wise independence of the hash functions {gi} mapping items to levels, we have 
Pr [i £ Si \ j £ 5r] = Pr [i £ Si] = 2~K By Fact 031 Pr [i £ Si \ j £ Sr, G] = 2“* ± n~^. □ 


E.l Sampling probability of items conditional on another item mapping to a 
level 

Restated Lemma (Restatement of LemmaHOl). Let i,j £ [n], i j and j £ Gr- Then, 

L 

Y, 2"'Pr [j £Gr'\i£Si,G]=l± 0{2^ ■ n"") . 

r'=0 

In particular, the following hold. 

1) Suppose j £ mid{Gr). Then, 

2^Pr [j £Gr\i£ Si,G] = 1 ± 2''^"" . 

Further, for any r / r', Pr[j £ Gr' \ i £Si,G'\ = 0. 

2) If j £ lmargin{Gr), then, 

2"+iPr [j £ Gr+i \ i£Si,G]+ 2''Pr [j £ G, | i £ Si,G] = 1 ± 2^+^n-^ . 

Further, for any r' ^ {r, r -|- 1}, Pr [j £ Gr' \ i £ Si, G'\ =0. 

3) If j £ rmargin{Gr), then 

2"Pr [j£Gr\i£ Si,g] + 2^-^Pr [j £ Gr-i | i £ 5/,^] = 1 ± 2'^+^n-^ . 

Further, for any r' 0 {r — 1, r}, Pr [j £ Gr' \ i £ Si,g] = 0. 

Proof of Lemma \lfA The proof proceeds identically as in the proof of Lemma [HI except that all 
probabilities are, in addition to being conditional on Q, also conditional on i £ Si. 

Case 1: j £ mid{Gr). Conditional on G, as argued in the proof of Lemma l45l part 1 (b), j £ Gr 
iff j £ Sr. Therefore, 


Pr [j £Gr\i£ Si,G] = Pr [j £ Sr \ i £ Si,G] £ 2 ^ ± 


n 


where, the last step follows from Lemma 06l 
Case 2: j £ lmargin{Gr). Let 


p'ir = Pr \fir\ >Tr\i£ Si,j £ Sr,G 


(77) 
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Then 


Pr[j G Gj. I i G Si,Q~\ — Pr \fir\ > T^,j G 5^ | i G Si,Q 


= Pr 


\fir\ ^Tr I i € Si, j € Sr, G Pi' [j G 5^ | i G iS/, 


= Pjr • (2 ^ ± n by Lemma HH 


( 78 ) 


Further, 


Pr[j G Gr+l I i G 5;,^] — Pr Qr < l/jrl < Tr,j G Sr,Ki — 1 | i G 5/,^ 


+ Pr 


\fir\ ^ Qri'i' G ^ Lr_|_i I i G Si, G 


(79) 


Conditional on |/jr| > |/i|— eT,. > Tr—eTr = Qr, since j G Imargin(Gr). Hence, Pr \fir\ < Qr \ G 
0. Further, since the coin toss iFj = 1 is independent of other random bits, Eqn. ()79p becomes 

Pr [j G Gr+l I i G 5/,^] = (l/2)Pr Qr < \ fir\ < Tr,j G iSr I i G Si,G 

= (l/2)Pr Qr < \ fir\ < Tj, I Z G Si,j G Sr,G Pl'[j G iSr I i G 5;,^] 

= (l/2)(l-p;,)(2-^±n-^) (80) 

Multiplying Eqn. (IHOl) by 2^+^, multiplying Eqn. (17^ by 2^ and adding, we have, 


ir+l 


Pr [j G Gr+i \iGSi,G]+ 2’'Pr [j g Gr \ i G Si, G]=l± 0(2’’n-") 


which proves statement (2) of the lemma. 

Case 3: j G rmargin{Gr)■ Then, 

Pr [j G Gr-i I i G Si,G~\ = Pr \fj,r-i\ ^ Tr-i,j G | z G Si,G 

= Pr |/j,r-l| ^ Tr-l I Z G G Sr-l,G ' Pr [j G 5r-l | Z G 
= Pr \\fj,r-i\ > Tr-l I i G Si,j G Sr-i,G\ (2-(^-i) ± rz-^) (81) 


Also, 


Pr [j G Gr I z G iS;, — Pr 
+ Pr 


j G Sr—l,Qr—l ^ 1| ^ T,— 1 , ATj — 1 | Z G 5;,^ 

1| ^ Qr,j G Sr, ^ Tr | Z G 5;,^ 


(82) 


Eor j G rmargin(Gr) and conditional on G, by following the argument of Lemma H5l it follows that 
if j G Sr then, \ fjr\ > Tr, viz., |/j>| > \ fjr\ — CTr > Tr-l — 2eTr-i — eTr > Tr- Therefore, 

Pi' l/i,r-l| < Qr,j G Sr, \ fj,r\ ^ Tr \ i G Si,G 

— Pi' l/i,!—1| ^ Qr,j G Sr I Z G Si,G 

~ Pr \fj,r-i\ < Qr I i G Si,j G Sr,G Pr [j G 5^ | z G Si,G] 

= Pr \fj,r-i\<Qr\iGSi,jGSr,G (2“'’± n“'') (83) 
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The estimate fj,r-i is obtained at level r — 1, and this is independent of whether j (or any 
other subset of items) is a member of Sr- The latter is a consequence of the level-wise product of 
independent hash values, namely, j G Sr iff j G 5^-1 and gr{j) = 1- Therefore, 


Pr 


^ I * € J G Sr 


= Pr 
Pr 


|/j,r—1| ^ Qr \ i & Si, j G Sr—l,gr{j) — 1 
\fj,i —1| ^ Qrjdrij') — 1 I i G Sl,j G 1 


Pr 


Prigrij) = l\ ieSi,j e 5r_i] 

9r{j) — 1 I \fj,i —1| ^ Qrii G 'Sl,j G S, —1 


Pr[5r(j) = 1 I i G Si,j G 5r_i] 


Pr 


\fj,r-l\ < Qr \ i ^ Si, j ^ Sr-1 j (84) 


Consider the numerator term of the fraction above: 

Pr gr{j) = 1 I \fj, r — 1 | ^ Qr-j'^ G Si,j G Sr — 1 ■ The event \fj,r — 1 | ^ Qr depends only on the set of 
elements that have mapped to 5r-i, and is independent of whether gr{j) = 1- Similarly, j G 5^-1 
is independent of whether gr{j) = 1- Thus, the numerator term equals Pr [gr{j) = 1 | i G 5;] and 
the denominator term also equals the same, for the same reasons. Hence, Eqn. (I84p becomes 


Pr 


\fj,i —1| ^ Qr I ^ S G Sr 


= Pr 


\fj,i —1| ^ Qr I ^ S G Sr— I 


Now, conditioning with respect to 0, we have. 


Pr 


fj,r—l ^ Qr I j S Sr,i G Sl,Q 


G Pr 


fi,r—l ^ Qr I j G Sr—l,i G Sl,Q 


± n 


Substituting Eqn. 


in Eqn. (|83]l . we have, 


Pr 


1 /?,^— 11 ^ Qri j G Sr, \fj,r\ ^ Tr \ i & Si,G 


= Pr 


^ Q^ I ^ —1’* ^ Si,Q 


Consider the first probability term in the RHS of Eqn. 


± n-^) ± 2-^n-^ 


Pr 


j G S, — l,Qi —1 ^ 1| ^ Tr—l}R-i — 1 I * G iSj,0 


— (l/2)Pr Qr -1 < |/j,r-i| < Tr-i \ i G Si,j G Sr-i,G Pr [j G 5^-1 | i G Si,G] 


= Pr 


Qr—l S: 1| ^ Tr—1 \ ^ G Sl,j G Sr—l,G 

Substituting Eqns. (187)1 and (f88ll in Eqn. ([82]), we have, 

Pr [j G Gr I i G iS;, 

Qr—l G l/jir—1| ^ TJ —I I i G Si,j G S, — 1,G 


(l/2)(2-('-i) ±n-") 


= Pr 


+ Pr 


2-'’ ± 0(n-'^) 


fi,r—l ^ Qr \ j G S, — 1 , i ^ Si,G 


)(2 


± n-^) ± 2-^n 


(85) 


( 86 ) 


(87) 


( 88 ) 


(89) 
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Multiplying Eqn. (fSTjl by 2^ ^ and Eqn. (l89l) by T' and adding, we obtain 
2^-iPr [j e Gr-i \ieSug]+ 2''Pr [j e G, M G G] 

= Pr \fj,r-l\>Ti_i\i£Sl,jeSr-l,G ±2^~^n~^ 

+ Pr Qr-i < |/j,r-i| < Tr-i | z G G 5^-1, ^ ± 0(2'’n“^)) 

+ Pr fi,r—i < Qr I j G 1, i ^ Si, g zb 0(2^n '^) 

= 1 ±0(2’'n-'^) . 

This proves statement (3) of the Lemma. 


□ 


E.2 Sampling probability of an item conditional on another item being sampled 
Restated Lemma (Lemma [T2j). Suppose i £ Gi, j G Gm o,nd j ^ i. Then, 

L 

Y, 2"+'''Pr [i G Gr,j eGr' \g]=l± 0((2^ + 2™)n-") . 

r,r'=0 

Proof of LemmaM^ Assume G holds for all the arguments in the proof. Case 1: i € mid{Gi). 
Then, 

Pr [i G Gr,i G Gr' I 0] = Pr [i G Gr I j G Gr',g] ■ Pr [j G G^/ | G] 

Conditional on i G G^ iff r* = / and i G Si- That is, for r ^ I, Pr [i G G,. | j G Gr',G] = 0. 

Therefore, 

Pr[ieGi\jeGr^,g] • Pr [j g G,' I e] 

= Pr [i G 5/ I j G Gr, G] ■ Pr [j G Gr' | ^] , by Lemma 051 part 2(b) 

= Pr [j G Gr' \ i ^ Si,G] ■ Pr [i G 5; | ^] , by Bayes’ rule 

= Pr[jGG,/ \ iGSi,G] •(2-'±n-") 

Multiplying by 2\ we have, 

2'Pr [i G Gr,j G Gr' I ^] = Pr [j G G^/ | i G Si,G] (1 zb 2^n“‘’) (90) 

By Lemma [TOl we have, X]r'=o P*" b ^ I * G 5;,^] = 1 zb Therefore, multiplying both 

sides of Eqn. (1901) by 2'’ and summing over r', we have, 

L 

Y 2'+’''Pr [i G Gi,j G Gr' \g]={l± 2™+in-")(l ± 2^71"^) = (1 ± 0(2”^ + 2^)n-'’) (91) 

r'=0 

Since Pr [z G Gr, j G G^' \G'\ = 0 for r / /, we can equivalently write Eqn. (l9T]l as 

L 

Y 2"+'''Pr [i G Gr, j G Gr' I g] = (1 ± 0(2”^ + 2071-") 
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Case 2: i G lmargin{Gi). Then, i may belong to either and to no other sampled group 

and i G G; U G^+i iff i G 5/, by Lemma [8] parts 2(a) and 2(b) respectively. 


Pr [i G Guj G Gr' I Q] 

= Pr i G Si, \fii\ > Ti,j G Gr' I G 
= Pi" \fil\ ^ Ti \ j € Gr',i & Si, G 
= Pi" \fii\ ^ Ti \ j E Gj.>,i E Si, G 


Pr [j G Gr',G] 

Pr [j G G^i \ i ^ Si,G] Pf [i ^ Si \ G] 
Pr[j€Gr' \ i£Si,G] (2-^±n-'=) 


(92) 


Let 


Pii = Pr 


\fil\^Ti\jG Gj.' ,i G Si, G 


Multiplying both sides of Eqn. (|92p by 2\ we obtain 


2'Pr [i G Gi,j G G,' I G] = Pii • Pr [j G G,' M G Si,G] (1 ± 2^n-'^) (93) 

We now consider the case when i G Gz+i- By construction, i G G^+i in two ways, either (i) 

i gSi,Qi <\fii\ <Ti and Ki = l, or, (ii) i G Si, \fii\ < Qi and i G Si and |/i,z+i| > Ti+i. Possibility 

(ii) cannot hold since, by Lemma 05] (lb), z G 5; iff Idii) = I, which by definition is that \ fii\ > Qi- 

These calculations are conditioned on G and therefore hold conditioned on j G Gr' as well. Hence, 


Pr [i G Gi^i,j G Gr' I G] 

= Pr i G Si, {Qi < \ fii\ < Ti),Ki = 1, j G Gr' \ G 

= (l/2)Pr Qi < \ fii\ <Ti\iG Si,j G Gr',G Pr [j G Gr' | * G Si,G] Pr [i G Si \ G] 
= (1/2)(1 - pii)Pr [j GCr'liG Si, G] (2-' ± n-^) 

or, by multiplying both sides of Eqn. (IM|) . 


(94) 


2'+iPr [i G Gi+i, j G Gr' \G]={1- Pii) • Pr [j G Gr' \ i G Si, G] (1 ± (95) 

Adding Eqns. (1931) and (l9^ . we have, 

2'+iPr [i G Gz+I, j G Gr' I G] + 2'Pr [i G Gi,j G Gr' \ G] = Pr [j G Gr' \ i G Si,G] (1 ± 2^+^n-^) . 

(96) 


By Lemma [TOl X^r'=o^^'^'' [j ^ M G 5;,^] = 1 ± 0{2^n '^). Therefore, multiplying Eqn. (IMI) 
by 2'’ and summing over r', we have, 

L 

Y, (2'+iPr [i G Gi+i,j G Gr' I g] +2'Pr [i G Gi,j G Gr' \ G]) 

r'=0 

L 

= Y ^’''Pr [j G Gr' I i G Si,G] (1 ± 2'+2n-‘=) 

= (1 ± 0(2”^n-'^))(l ± 0{2^n-^)) 

= 1 ± 0((2^ + 2"*)n-‘=) 


57 














Since, Pr [i G Gr,j G Gr' \ G] = 0 for any r ^ {1,1 + 1}, we can rewrite the above equation as 

L 

Y, 2'’+"'Pr [i G Gr,j G Gr' \g]=il± 0(2"* + 2V-") 

r,r'=0 


Case 3: i G rmargin{Gi). If j G Imargin(Gm) or j G mid(Gm), then, we can interchange the 
roles of i and j and the lemma is proved. Hence, we may now assume that j G rmargin(Gm)- Let 
m < I without loss of generality. 

By Lemma[8l part (3), i G G;_i U G/ and this implies that i G Si. Also, i 0 qG;/ (with 


prob. 1). Let Pi,i-i,j,r' 


Pr 


\fi,i-i\ > Ti_i 


j G Gr',i G 5;_1 


. Then, 


Pr [i G Gi-i,j G Gj.' I — Pr > Tz_i,i G Si-i,j G G^' | ^ 

= • Pr [j G G,,/ I i G 5z_i, Pr [i G Si-i \ Q\ 


7 'f'/ \ % 

= Pu-^ ■ Pr \ j G Gr' I i G Si 


n\ 


(97) 


Let = Pr Qi-i < \ < L]_i | i G Si-i,j G G^' . By LemmaElpart 3 (b), {i G Si,ld{i) / 

/ — 1} C {i G Gj}. Then, 


Pr [i G Gi,j G Gr' I G] 


= Pr 


+ Pr 


< Qz-lO ^ Si, j G Gr' I 0 


Qz -1 < l/i,z-i| < Ti_i,Ki = l,i £ Si-i,j G Gr' I 0 
= (l/2)9i,z-ij,r'Pr [j G Gr' I i G Pr [i G 5i_i] + Pr \fi^i-i\ < Qz-iO S 5;,j G Gr> \ G 

= ■ Pr [j G Gr' I i G 5;_1, (2“* ± 0{n~^) 

+ Pr |/i,z— 1 | < Qz— 1 I * S 5;, J G Gr', ^ Pr G Gr' I i G 5/, Pr [i G 5; \ G] 


Consider the following term derived from the second term in the above sum. 


(98) 


Pr 


|/i,Z-l| < Qz-1 I * G iSi,j G Gr',G 


= Pr 
Pr 


|/i,z-i| < Qz -1 I gi{i) = 1,* £Si-i,j G Gr' 
|/i,z-i| < Qi-i,gi{i) = 1 I i G Si-i,i G Gr' 


Pr [gi{i) = 1 I i G 5/_i, j G Gr'] 


(99) 


Pr 


|/i,Z-l| < Ql-l,gi{i) — 1 I * G Si-I,j G Gr', G 


= Pr 


gi{i) = l\i £ Si-i, l/i,z-i| < Qi-i,j £ Gr',G Pr |/i,z-i| < Qi-i \ i £ Si-i,j £ Gr',G 


The event gi{i) = 1 is independent of the value of since they depend on the values of gi'{k)'s 

for k £ [n] \ {i} and 1 < /' < L Now conditional on G and given that j £ rmargin(Gm), for m < I, 
the event j £ Gr' has zero probability unless r' £ {m — l,m}. 
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Case 3.1. r' = m — 1. In this case, j G Gm-i- Since, j G rmargin(Gm), the event j G Gm-i 
depends only on the value of Since, m < I, the random bit defining gi is independent of the 

values of the random bits that determine Therefore, 


Pr 


gi{i) = I \ i G Si-i, \ fi^i-i\ < Qi-i,j G Gr',G 


Pr[gi{i) = l\G] 


Arguing similarly, Pr [gi{i) = 1 | i G SiJ G Gr',G] = Pr [giii) = l\G]. 
Therefore, it follows from Eqn. ()99p that 


Pr 


< Ql-l I i ^ Si, j G Gr',G 


= Pr 


< Ql-l, I i S 5/_i, J G Gr 


( 100 ) 


Case 3.2. Suppose r' = m. Since j G rmargin(Gm), therefore, j G Gm is equivalent to the 
event < Qm-i and j G Sm- If m < /, then the event gi{i) = 1 is independent of the values 

of and the event j G Sm. Hence the same conclusion as Eqn. (llOOl) holds when r' = m and 

m < 1. 

Now suppose r' = m and m = 1. Then, we have. 


Pr giii) = 
= Pr [giG) 


= Pr 


m{i) 


1 I i G Si-i, \fi^i-i\ < Qi-i,j 
= 1 I j G Gr',G] 

= 1 I \fj,l-i\ < Ql-i,9l{j) = 


G Gj.1 , G 

£ Si-i,G 


= PrbK*) = 1 I 9iU) = 1,^] 
= Pr[9iii) = 1 I ■ 


Hence, Eqn. (jlOOp continues to hold in this case as well. Thus in all cases, Eqn (jlOOp holds. 
Substituting this into Eqn. (IMI) . we have, 


Pr [i G GiJ GGrQG] 

= 91,1-1,j,r' ■ Pr [j G Gr' I i G (2“^ ± 0(n“'^) 


+ Pr \fi,l-l\ < Ql-l I * S Sl-I,j G Gr',G 
(,9i,i—i,j,r' T 1 iPi,i—i,j,r' 9i,i—i,j,r')^ Pr [j G Gj.' | i G Sl—\,G~\ (2 i 0(n ) 

= (1 — Pi,i-i,j,r')P'^ [j ^ Gr' I i G Si-i,G] (2“^ ± OirT^) 


Pr [j G Gr’ I i G 5;, 0] Pr [i G 5; | Gj 

( 101 ) 


Multiplying Eqn. (fOTli by 2* ^ and Eqn. (llOip by 2*, we have for r' G {m — l,m} that 

2^-iPr [i G Gi_uJ e Gr' I G] +2^Pr [i G GiJ G Gr' | = Pr [j G Gr' M G Si_^,G] {l±0{2^n-^)) 

( 102 ) 


The LHS of (jl02p can be equivalently written as X)r=o^’'P'' ^ Gr,j G Gr' | G] , since, for r ^ 

{/ — 1, 1}, Pr [i £ Gr \ G] = 0. Therefore, 

L 

[i G Gr,j G Gr' \G]=Pr [j G Gr' I z G Si-i,G] (1 ± 0{2^n-^)) (103) 

r=0 
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By Lemma [TOl we have, 


L m 

Pr [j eGr>\ie Si.i,g] = ^ Pr [j e G.' M G Si.i,g] = l ± 2™0(n-^) 

r'=0 r'=m—1 

Combining with Eqn. (llU3p . we have, 

L L L 

^ ^2^Pr [i e Or, 3 €Gr'\Q]=Y P' b G G,. M G 5z_i,g] (1 ± 0(2'n-^)) 

r =0 r ^=0 

= (1 ± 0(2”"n-^))(l ± 2^n-^)) = 1 ± 0((2' + 2”")n-'^) . 


□ 


F Application of Taylor polynomial estimator 

Throughout the remainder of this section, let Y denote a code given by Corollary [5j 

F.l Preliminaries 

Notation. We first partition the random seeds used by the algorithm by their functionality. For 
strings s and t, let s © t denote the string that is the concatenation of s and t. 

Let gi denote the random bit string representing the seed used to generate the hash function 
gi, for I S {0} U [L], and let g denote the concatenation of the seed strings gi (B g 2 , ■ ■ ■ ® dL- For 
I S {0}U [L] and j G [s], let hHH,i,j denote the random bit string used to generate the hash function 
corresponding to the jth hash table in the HHj structure; let hHH,i denote the concatenation 
of the random bitstrings ®j^[s^HH3 and Jihh denote the concatenation of the random bitstrings 
For I G {0}U[L] and j G [2s], let hij denote the random bit string used to generate 
the hash function hij in the tpest ; structure. Let hi denote the random bit string and 

let h denote the concatenation h = ©i6{o,i,...,L}- Let denote the random bit string used to 

generate the Rademacher family used by the jth table of the HH; structure, for 1 G {0,1,... , L} and 
j G [sj. Let ^hh,i = ®jG[s]CHH 3 ,j and let ^hh = ®i&{o,i,...,l}^hh,i- Let ^ij denote the random seed 
that generates the Rademacher variables {Cijik)}ke[n] used by the jth table in tpest ; structure, 
for j G [2s]; let = ©jgpsjGj and let ^ = ©z 6 {o,i,...,l}C«- Let C denote the random bit string used 
to estimate F 2 . 

The full random seed string used to update and maintain the Geometric-Hss structure is C® 5 © 
huH © ^HH © h © ^. In addition, during estimation, an n-dimensional random bit vector K is also 
used. 

Note that the events in g are dependent only on © 5 © Jihh © ^hh- This is further explained 
in the table below. 
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Event 

Random bit string that determines the event 

GOODF 2 

c 

NOGOLL 

h 

GOODEST 

hnn 

SMALLRES 

9 

AGGUEST 

g © fiHH 

GOODFINALLEVEL 

g © Tihh 

SMALLHH 

g © hnH 


F.2 Basic properties of the application of Taylor polynomial estimator: Proof 
of Lemma 1131 -Part I 

For items G [n] with k ^ i, hash table index j G [s] and Z G [L] U {0}, define the indicator 
variable Uikji to be 1 if hij{i) = hij{k). 

Proof of Lemma [Til parts (a), (b) and (e). Suppose G holds. The last statement of the lemma 
follows from GOODFINALLEVEL, which is a sub-event of G- 

Let I = ld{i) G {0} U [L — 1], By ACCUEST, \fii — fi\ < eTi. Since i is discovered at level I, 
\fii\ >Qi = Ti- eTi. So, \fi\ > (fill - eTi > Qi - iTi = Ti - 2^i and therefore, 

\fi - fi\ ^ m ^ l/(27p) 1 

\fi\ - (1 - 2e)Ti - (1 - 2/(27p)) 26p 

since, e = = l/{27p) and p>2. Therefore, 

I/,I -(l-e-)r; 26p ■ 

This proves parts (a) and (e) of the lemma. 

Let j G Ri{i) and ld{i) = I- For k G [n], let yik be an indicator variable that is 1 if A: G 5; and 
is 0 otherwise. Then, 

fk-yik- ^Ij{k) ■ Uikji • ■ sgn(/i) 

fce[n] 

Since it is given that ld{i) = I, it follows that 

Xiji = fi ■ sgn{fi) + ^ fk ■ yik ■ Clj{k) ■ Uikji ■ ^ij(i) ■ sgn{fi) . 

k£[n],k^i 

We now take expectations. Note that the events in G are independent of the Rademacher family 
random bits ^ij{k). Also, the event uikji = 1 depends only on g®hi and the event ld{i) = I depends 
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only on g 0 /ihh- Therefore, 


\^d{i) = IJ Ri{i),G] 

= fi ■ sgn(/i) + ^ [6jW ■ Cij(i) • yik ■ Uikji I Idii) = ^ J S Ri{i),G] 

fce[n]\{i} 

= fi ■ sgn(/i) + ^ /fc • [6i(^)6i(*) I y«fc = = 1, j e Ri{i),G] 

kGSi 


= fi ■ sgn(fi) + 0 


• Pr[uikji = l,mk = 1 IJ s Ri(i),G] 


(104) 


since, C/j(k) and depend only on and is independent of the conditioning events. The 

expectation is zero by pair-wise independence and zero-expectation of the family {6j(s)}se[n]- 
Hence, Eqn. (|104l) becomes 

I ^d(i) = l,j G Rlif),G] = fi ■ sgn(fi) = fi • sgn(fi) = \ fi\ (105) 

because, since, ld{i) = I, \fii\ > (1 —e)T; and therefore, sgn(/j±eT;) = sgn(/j), since, e = l/(27p) < 
1/2. Since G holds, by ACCUEST we have, 

sgn(/i)sgn(/i) = sgn(/i)sgn(/i ± eT^) = sgn(/i)sgn(/i) = 1 

and therefore sgn(/j) = sgn(/i). Hence Eqn. (jin5l) holds. 

□ 


F.3 Expectation of r?. 

Proof of Lemma By Lemma [T3j we have, 

I ^dii) = l,j G Ri(i),G] = \fi 

and therefore, 


% [Xiji I ld{i) = l,j G Ri(i),G] = [Xiji I ld(i) = Gj G Ri(i),G]\ = l/i| . 

By Lemma [13] part (a), if i is discovered at level I, then, \fii — fi\ < 

Since NOCOLLISION holds as a sub-event of G, |.Rz(0l ^ Let , js} be any s-subset 

of Ri{i) such that 1 < < ^2 < • • • < js < 2s and y G E be a code with vr^ : [k] — [k] 

being a random permutation. Let y = (yi,y 2 , ■ ■ ■ ,yk) be the A:-dimensional increasing sequence 
1 < yi < 1/2 < • • • < J/fc E 2s representing the k non-zero positions in the s-dimensional bit vector 
y. Then, ^iyi = {l)\fi\^~''irr=ii^i,jv„^,yi-\fi\)- Therefore, each G Ri(i), for 1 < r < A:. 

% [&iyl I ld(f) = I, G] 

k 


E 

k 

E 

17=0 


r=l 




\f^ 


.\P~'^ 


I kii) — ^ Rl(i) \fi 


r=l 
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which by Corollary [2] is bounded above as follows 




a 


1 — a 


fe+i 



p 


< 

< 


l/{26p) \ 


k+l 


\h\ 


l-l/{26p)J 


since k > 1000 log (n). 

Since, E [-dj] = E [i^iyi] for each y & Y and random permutation vr^, the lemma follows. Addi¬ 
tionally, if p is integral then, E [?9j] = E [■dj/] = \fi\P. 

□ 


F.3.1 Probability that two items collide conditional on the event nocollision 

We first prove a lemma that bounds the probability that two distinct items collide under a hash 
function hij conditional on j being in Ri{i). 

Lemma 47. Let ld{i) = I, k G Si and k 0 Topk(C';) and i ^ k. If the degree of independence of 
the hash family from which the hash functions hij are drawn is at least 11, then, 


1. Pr 


'^ikji — 1 I ^d(,i') — ^ Riil)ik G Si,k ^ ToPK(t7;) 


G 1- 


16C, 


Ci-0.5T0.5 


±2 


1 


Cl 

t-lj \ 16Q 


t-i 


and. 


2. Pr 


Uikji = 1 1 ld{i) = l,j G Ri{i),k G Si,k ^ Topk{Ci),G 


G 1- 


16C, 


Ci-i 


±2 


1 


Cl 

t-ij y i6Ci 




± 0{n 


Proof. Since Uikji = 1 is equivalent to hij{i) = hij{k), we have. 


Pp Uikji = 1 I ld{i) = l,j G Ri{i),k GSi,k ^ Topk(Q) 


Prt 


j G Ri{i) I Uikji = 1, ld{i) = l,k G Si,k ^ Topk(C'z) 


Pr, 


j G Riii) I ld{i) = l,k G Si,k ^ Topk(Q) 


•Pr, 


^ikjl — f I — I, k G Si, k ^ T0PK(C;) 


(106) 


First, 


Pr, 




1 I Idii) = l,kGSi,k^ f(^(C'i) 


— Pr, \uikji — 1] 



(107) 


since, the event Uikji = 1 depends solely on hij and is independent of the events k G Si and 
k 0 TopkC,. 
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Secondly, 


Prt 


j e Ri{i) I Uikji = 1, ld{i) = l,k e Si,k ^ Topk(C'/) 


Pd 

Vi' G ToPK(Ci) \ {i}ihijii') / hijii)) Uikji = 1, Idii) = l,k £ Si,k ^ ToPK(C'i) 


Pd 

(^Vi' G ToPK(C'i) \ {i}ihijii') / hijii))'j and Uikji = 1 Idii) = l,k £ Si,k ^ Topk(C'z) 

Pd 

'^ikji — 1 1 — l-jk Ei Sij k 0 Topk((7/) 



Prt 


\/i' G ToPK(C'i) \ {i}(%(i') / hij{i))\ and Uikji = 1 | Idi^} = l,k e Si,k ^ Topk(C'z) 


Pd — 1] 


(108) 


Uikji = 1 is a function solely of hij and it is independent of the events ld{i) = I and k 0 Topk(C';). 
Hence, the denominator term in Eqn. (jlOSp is simply Pr^ [uikji = !]■ 

Consider the numerator of Eqn. (|108l) . Let A = Topk(C/), |H| = k. Then, 

Pr (yi' G ToPK(Ci) \ {i}ihij{i') / hij{i))^ and Uikji = 1 | Idii) = l,k e Si,k ^ Topk(C'z) 

= ^ Pd {yi'e A\{i}{hij{i') ^ hij{i))) ,Uikji = 1 \ Idii) = l,k e Si,k ^ A,Topk{Ci) = A 

Ac[n],\A\=Ci 

■ f^(Ci) = A I Idii) = l,keSi,k^A 

= P’^t [{yi'^ A\{i}{hijii') hijii))) ,Uikji = l]Pr Topk{Ci) = A\ Idii) = l,k £ Si,k ^ A 

Ac[n] 

\A\=Ci 

(109) 

since for a fixed A, the event {Vi' G H \ {i}(/i/j(i') / hijii)) and Uikji = 1} is independent of the 
events Idii) = l,k £ Si and k ^ A. 

We now estimate the probability Pr^ [(Vi' G H \ {i}(/i;j(i') / hijii))) , Uikji = 1]. The event {Vi' G 
A\{i}ihijii') / hijii)),Uikji = l} is equivalent to (^-^\/i/(zA\{i}iuii'ji = 1)) Siuikji = !)• Therefore, 
by inclusion-exclusion, we have. 


Pd 


= Pd 


' \J iuii'jl — 1 ) j iuikjl — 1 ) 

2'eA\{i} 


' \/ iuii'jl — 1) I Uikji — 1 


Pd [Uikjl — 1] 


= 1 - Pd 


iUii'jl 1) I Uikji f 


Pd [Uikjl 1] 

Following the inclusion-exclusion arguments as in Lemma H2] and using the notation that that P[-] 
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denotes the probability measure assuming full-independence of the same hash family, we have, 


1- Prt 



1) I ^ikjl 1 

t-1 

1 I ^ikjl f 


- 1-P 


1) I ^ikjl 1 

i'eA\{i} 


<2 Y. P 


r=l 


< 2 


Cl 


1 


t-i 


^t-lj \ IQCi 

Therefore, 

Prt [(Vi' G A\{i}{hij{if) / hij{i))) ,Uikji = l] 

— I 1 \J — 1) I — 1 

i'eA\{i} 


1 - P 


= 1 - 


\J 1) I ^ikjl 1 

1 


['^ikjl — 1] 

(C, 


±2 


IGCiJ 


±2 


Cl 


1 


Vt - ly V 16 ^ 



Vt - ly V leCj 


Pi" {'^ikjl — 1] 


t-1 


Pi" [Uikjl — 1] 


Now, Cl - li^A ^Ci- 0.5 =F 0.5. 

Substituting in Eqn. (llOOp . we have, 

Prt G ToPK(C't) \ {i}{hij(i') / hij{i))^ and Uikji = 1 | Idii) = l,k e Si,k ^ ToPK(C't) 

= X] ^ / hij{i))) and Uikji = l] 

Ac[n],\A\=k 


■ Pr - 


1 


/ / \ Ci-0.5T0.5 / 


ToPK(C't) = A I ld(i) = l,k € Si,k ^ A 

Pl^ [Uikjl — 1] 


Cl 



A(l[n\,\A\=k 


ToPK(C't) = A I ld(i) = l,k G Si,k ^ A 


U lecj [t-ijyi6Ci 



Pr [uikji = 1] 


( 110 ) 
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Substituting in Eqn (|108l) . we have, 


Prt 


j G Ri{i) I Uikji = 1, ld{i) = l,k e Si,k ^ Topk(C'z) 

1 


[Uikjl — 1 ] 


Prt 


Mi' G Topk(Q) \ / hij{i)) ) and Uikji = 1 


1 \ Cj-O.StO.S 


ld{i) = l,keSi,k^ Topk(C'0 

t-i 


Cl 

t - 1 


1 


WCi 


In a similar manner, we can show that 


Prt 


j G Riii) I ld{k) = l,kGSi,k^ Topk(C0 


= 1 - 


16 Ct) 


C(-0.5t0.5 


±2 


Cl 


16Cj 


Substituting Eqns. (Illip . (I112p and (IE. 3. ID in Eqn. (I106p . we have. 


Prt 


Uikji = 1 I kii) = l,j e Ri{i),k eSi,k ^ Topk(Q) 


Prt 


j G Ri{i) I Uikji = 1, Idii) = l,k e Si,k ^ Topk(C'/) 


Prt 


1 - 


16C, 


j G Ri{i) I ld{i) — l,k G Si, k 0 ToPK(C't) 

(i^) 


Uikji — I I ^dijk) — l,k G Si, k 0 TOPK(C7t) 
Cj-O.StO.S . ^ \ t-1 


1 - 


16 Ci 


Ci-0.5±0.5 


(ife) 


1 


-f—y 

16Ci \16t/ 


Eor t = 11, the above ratio is bounded by • 

Conditioning with respect to by Eact 031 the above_j^bability may change by n“ 
conditioned on G, we have that ld{i) = I implies that i G TopkPC;). Hence, 


Prt 


j G Ri{i) I Uikji = = l,k G Si,k ^ TOPK{Ci),g 


= 1 - 


16 Ct) 


Ci-l 


±2 


1 


Cl 

t-lj \ IQCi 


t-i 


zb n 


(111) 


( 112 ) 


(113) 


Also, 
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Proceeding similarly as in Eqn. (11131) . we have, 


Pp Uikji = 1 I ld{i) = l,j ^ Ri{i),k £Si,k ^ ToPK{Ci),g 


Pp 


j G Riii) I Uikji = l-,ld{k) = l,k e Si,k ^ ToPK{Ci),g 


Pp 


j G Riii) I Idii) = l,keSi,k^ ToPK{Ci),g 


Pp 


'^ikji — 1 I ^dik) — k G Si , k 0 ToPK((i7/), g 


/ / xCi-0.5=F0.5 s / 1 \i-l 

^“Tecr) =*= ^G-i) ( T6C7 ' ± n 


^ NCi-0.5±0.5 

6Cij 

For t = 11, the above ratio is bounded by ( 


\ leCi) =F 2(G0 (leCi) =F 


n 


' ±f::^V±n- 


16Q Vl6t/ 


□ 


F.4 Basic properties of the application of Taylor polynomial estimator: Proof 
of Lemma 1131 -Part II 

We now complete the proofs of the remaining parts of Lemma flBl 

Proof of LemmaH^ parts (c), (d) and (f). Recall that yik is an indicator variable that is 1 iff A; G 
Si- Given that i G Si the random variable Xiji is defined as 

Xiji = {fi + '^fk- Uikji ■ fij{k) ■ iij{i) ■ yik)sgn{fi) . 
k^i 


As shown in the proof of Lemma [HI [Xiji \ j G Ri{i),ld{i) = l-,g] = \ fi\- Further, 

K[xii\jGRi{i),id{i) = i,g] 


— IR- 


^l,^[xli\jGRi{i),ld{k) = hQ] 


fi ~k^hHH,l®hij®g 


^ ^ fk ' Uijkl ■ yik I j G Riiif Idif) l^g 

fce[n]\{i} 


since the expectation with respect to the Rademacher family of tpest structure is independent of 
the random bits used to define g and Ri{i). 
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Therefore, 


Now, 


[^ijl I J ^ Rl{i)Jd{i) — ^,Q] 

^ [^ijl I 3 ^ Rl{i)Jd{i) = ^,S] 

~ {^iijehHH,i®hijm I •?' ^ Ri{})iU{i) = liG]^ 

— fi ~^^g®hHH,l^hij fk ' '^ikjl ' Vlk \ j S Rl{i),ld{i) = l,G — \fi\ 

_fce[n]\{i} 

~ fk ■ ^’’^g^h.HH.l^hj — '^ \ 3 ^ Rl{f)^ Idij) = l-,k ^ Sl,Q\ 

fce[n]\{i} 

■ ^'^g®hHH,l®hij \yik = 1 I J G Rlii),ld{i) = l,G] ■ 


Pr [Uikji = 1 I j G Riii), ld{i) = l,k e Si, Q] 


= Pr 


= Pr 


Uikji = l,k ^ Topk(Cj) I j G Ri{i),k G Si,ld{i) = l,G 


+ Pr 


Uikji = l,k e Topk(C'z) I j G Ri{i),ld{i) = k,Q 


Uikji = 1 I j G Ri{i), k £ Si,k ^ Topk(C'/), ld{i) = l,G 


< 


/I + 10-1® 

V (leco 


Pr A: 0 Topk(Cj) | j G Ri{i), k G Si, Idif) — l,G +0 
Pr k^ ToPK(Ci) I j G Ri{i),k G Si,ld{i) = l,G 


by fLemma 1471 with t = 11. 

Substituting in (I114p . we have that 


(jIi< Y1 fk 


fce[n]\{i} 

• Pr 


n +10-1® 
V ( 16 Q) 


k 0 Topk(C'z), A: G 5; | j G Ri{i),ld{i) =l,G 


< 


/I+ 10-1® 

V ( 16 Q) 

[I + 10 - 1 ® 


E 


/l-i 


V ( 16 Q) 

It can be shown that, conditional on GOODEST, 


ke[n]\{i},k&Si,k^T0PK(Ci) 


Fr 


pres 


,l)<9Fr{Ci,l) 


(114) 


(115) 


(116) 


(117) 


(this is explicitly proved in [18]; variants appear in earlier works for e.g., [HI [T3l [22]). Since, 
SMALLRES holds as a sub-event of G, {Ci,l) < l.SF^®® ((2a)*C) /2*-i. Therefore, Eqn. (11171) 
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may be written as follows. 


- 


1 + IQ-i® 
(16Q) 






< 9(1 + 10 ^^)Fr{Ci,l) I^ToPKi(Q), /j < 9Fr {Ci,l)) Hg [22] 


< 


< 


WCi 

9(1 + 10-^®))(1.5)F|®" {{2ayC) 

Q(16)2'-i 
9(1 + 10-16)(1.5)F2 


{0 implies smallres. 


8{2ayC 

< (17/10)(erz)2 . 

This proves part (d) of Lemma [131 
Hence, 

rifji = \h - h\^ + 4; < mf + (17/10)(e-r02 < 2.1{eTif . 

Since, i is discovered at level I, \ fii\ >Qi = T/(l — e) and therefore, |/jl > Qi — (Ti — Ti{l — 2e). 


Hence, 


l/il 


> 


> 15p. Further, ^ 


riijl - 


Vijl 


> 


T(i-g) 

VTleTi 


> 16p. This proves parts (c) and (f). □ 


F.5 Taylor polynomial estimators are uncorrelated with respect to ^ 

Proof of Lemma [731 The expectations in this proof are only with respect to 

Consider ]E|- and each use the TPEst structure at levels ld{i) and ld{i') respectively. 

If Idii) 7 ^ then the estimations are made from different structures and use independent random 

bits and therefore. 




I fiJi'G 


= % 


'&i I fi,G 




I fi,G 


Now suppose that ld{i) = ld{i') = ^ (say). Then, |/j;| > Qi and |/j/;| > Qi. Since smallhh 
holds as a sub-event of Q, C {k : \fki\ > Qi} C Topk(^,C'/). Therefore, by nocoll/, the 

estimates and are such that if j G Ri{i) H Ri{i'), then, hij{i) ^ hij{i'). 

Let gi, ^ 2 , • • •, ( 7 s be some permutation of the table indices in Ri{i). Likewise let ( 7 '^, ( 72 ; ■ ■ ■ j be a 
permutation of the table indices in Ri{j). Then, 


E,- 


= % 


\ fi,fi',G 

k V ^ 




\v=0 id=l 

fiJi',G 


E 7.(l/*l)7.'(l/*'l)Ef 

v,v'=0 


\v'=0 


w'=l 


- i/*i) n - \fi'\) I lu,g 


_w=l 


w'=l 


(118) 


Consider E^ 


- i/*i)n:;'=i(^*',,c,( - m i fi,u,G 


. For some 1 < tc' < vk if 


q' , 0 {( 71 , q 2 , ■ ■ ■, qw}, then, the random variable X^, „/ ; — |/j/| uses only the random bits of f,i„i 
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and is independent of the random bits I 1 < used by any of the Xi^q^^i, for 1 < w < v. 

An analogous situation holds for any 1 < tj; < n such that Clearly, for distinct 

tables, j,j', ;] is the product of the individual expectations, by independence of the 

seeds of the Rademacher families {Cij{k)} and {Cij'{k)}. Therefore, 


Ef 


- i/*i) n I 

.w=l w'=l 


n 

^'■q',^{qi,-,qv} 


Uq' , 


n 

j&{qi,--;qv}r\{q'^,...,q'} 




We analyze XijiXi^ji \ fuJinQ ■ 


Ea, 


XijiXifji I fil, fi>i, Q = sgn{fi)sgn{fj 


•Ea. 


(^fi T iljijG 'y ^ fk ■ • ^fi' T CljG ) y ^ fk' ■ ^lj{k ) ■ Ui'k'jl 


k^i 

fil •) fi'l •} Q 


k'^i' 


(119) 


Suppose we use linearity of expectation to expand the product and take the expectation of the 
individual terms. The expectation of the terms of the form Eg^ {iij{i)iij{k)uikji] = 0 since i ^ k 
and the random variable Uikji is independent of Similarly, Eg^^. [Cijii')Cij{k')ui/k'ji\ = 0. We 
also obtain a set of terms of the form E^^ ■ iij{k) • ^ij{k') ■ Uikji ■ Ui/k'ji] ■ Since, j G 

Ri{i) n Now Uikji ■ Ui/k'ji = 1 only if hij{i) = hij{k) and = hij{k'). We 

conclude that {i,i',k,k'} are all distinct, and by 4-wise independence of the {Cij{u)}i<u<n family, 

■ '^ikji • Ui'k'ji] = 0. Therefore, Eqn. (jll9p becomes 




XijiXi'ji \ fil, fi/i,g = |/j||/j,| = E^ Xiji\fii,Q 




Xijl I fi'liQ 


It follows that 


E. 

= %, 


{Xiji - \fii\){Xi/ji - \ fiq\) I fii,fi'i,g = (i/ii - \ fii\)){\fi' \ - \ fi'i\) 


Xiji - \fii\ \ fii,G E^y Xiiji - \fifi\ 1 fii,G 


For = ld{i') = ^ (IllSp simplihes to 


Ef 


I fiufi'hG 


= Ef 


I fihG 


E, 


^i' I fi'hG 


Thus, 'di and -di' are uncorrelated in all cases. 
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Since, 'di is the average of the Taylor polynomial estimators 'di for randomly chosen permu¬ 
tations, the variables di and di' are also uncorrelated in all cases, whether I ^ I' or I = I', that 
is. 




didi' 1 fii,fi'i',G 


= E, 


di I fii,G 


E, 


di' 1 fi'i',G 


( 120 ) 

□ 


G Expectation and Variance of pth moment estimator 

In this section, we analyze the expectation and variance of the estimator Fp. 


G.l Expectation of the Fp estimator 

Proof of Lemma [7^ Define level : [n] —>• {0, 1, 2 ... , L -|- 1} to be the function that maps each item 
i E [n] to the index of the group it belongs to, that is. 


level (f) 


I if f € G; 

L + 1 if /* = 0. 


Then, by definition of the Yfs, 


E[Fp I g] =E I e 


iG[n] 

= E E E 2''^: I e] 

1=0 i&Gi l'=0 
L L 

= EEE 2^'e [di\i&Gi',G] Pr[ieGi'\G] 

/=0 i&Gi r=o 
L L 

= EEE2'i/.Ri± n"4000p)pr g Gv \G], by Lemma [H 

1=0 i&Gi l'=0 

= E E ± E [i e Gi' I G] 

1=0 i&Gi l'=0 

L 

= EE \fi\P{l ± n-4000P)(l ± 0(2'^''^'Wn-'=)), by Lemma[8] 

/=0 ieGi 

= Fp{l±2^+^n-'^)) . 


(27p)^e“^ 

Let G = K''n}~‘^/P where K’ = — , _ - —-, as given in Figure [2j 

min(e‘^T log re) 

Since a = 1 - (1 - 2/p)(0.01) > 0.99, 

L = riog2„(n/G)] < 1 -t logi 98(re/G) <1 + (1.02) log2(n/G) <1-1- (1.02) log2(n^/^/iL') . 
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Hence, 


1.02 


2 ^ < 2 



and so 0{2^^^n = 0{n proving the lemma. 


□ 


G.2 Variance of Yi 

In this section, we calculate Var[l^]. For sake of completeness we first present proofs of some 
identities stated in Eqn. (l8l) . 

Fact 48. For any p > q, Fq < 'n}-~‘^I^Fp^. In particular, F 2 < n^~‘^^'PFp^^ for any p > 2. 

Proof. Let V be a random variable that takes the value with probability 1/n, for i G [n]. 
Then, 

K[X] = ^ . 
n 

By Jensen’s inequality, for any function / that is convex over the support of X, IE [f{X)] > /(IE [V]). 
Choose f{t) = Since p > q and the support of X is M-*^, f{t) is convex in this range. Therefore, 
E [f{X)] = By Jensen’s inequality applied to /, we have, 

/ jr \p/i TP 

( ^ ) < ^, or, Fg< n^-i/PF^/P . 

\ n J n P 


□ 

In the following proofs, we will use the notion that the sample group of an item is consistent 
with the frequency of the item to mean that if i G G; and i is sampled into Gr, then, I and r are 
related as given by Lemma[ 8 l conditional on Q. (For e.g., if i G Imargin(Gi), then, r G {/, Z + 1}, if 
i G mid(G/), then, r = I, and if i G rmargin(Gi), then, r G {/ — 1,0)- 

Proof of Lemma [3 For this proof, assume that Q holds. 

Case 1: i G mid(Go). Then i G Go with probability 1 and ld{i) = 0. Therefore, 


L-l 

Hi = ^ 2' • Zii ■ Hu = Hio 
1=0 

since, Zio = I and zu = 0 for ^ > 0. Let Hi denote Hio. Therefore, Var [Yi \ Q] = Var [Hi \ Q'\ . 

From Figure[2l we have, G = {27p)^B > [27p)"^Ke~‘^n^~^^P/ log{n). Since the estimator Hi uses 
the TPEST structure at level 0, by Lemma flJl (part (b)), we have, p = E [Vjjo | Q] = \fi\ and by 
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part (iv) of the same lemma, rjf-Q < {2.7)F2/C, for each j G Roii)- Therefore, by Lemma [H 
Var Go,g] < 


< 


< 


< 


(0.288y|/,pP-^ \ /2.7F2\ 

(iooo)(iog n) y ^ C y 

(0.288y|/,pP-^ \ / (2.7)(1.0005)F2 

(1000)(log ri) y \(27)p‘^K€~^n^~^/P/{log{n)) 

i0.3)e^\ 

{10)^K 


( 121 ) 


where, the last step uses the fact that F 2 < Fp^^ri^ ^^p, for p > 2 from ([8]), and that F 2 < 
{l + 0mi/{2p))F2. 

Case 2: i G lmargin{Go) Ll^=i Gr- If i G Gi, then, ld{i) G {1,1 — 1} and if z G then 

^ — 1 < r < / + 1. By Lemma [T3l Piji^[i) < l/il/(15p) for j G Ri{i). From Lemma [H we have. 


Var [z9. I z G Gr,G] = 




(750)/c 


Hence, 


Var \Yi I g] = Var 

L 


^2^^air I g 


.r=Q 


^ 2^'’Var [VjZjr] + ^ 2’'+^'Cov 


r=0 

L 


0<r,r'<I/ 

r^r' 


22''Var 


( 122 ) 


r=0 


The last step follows since Ziy ■ Zir' = 0 whenever r ^ r', since z may he in only one sampled group. 
Simplifying (|122p . we have. 


Var [diZir] < E [&‘izir] = E [?9^ | Zir = l] Pr [Zir = 1] 


(123) 


Assuming that r is a level that is consistent with z (otherwise Pr [zir = 1 \ g] =0), we have, by 
Lemma [13] that E [z?, | g~\ G |/i|^(l ± d) where, < l/il/(15p), for j G Ri{i). Using LemmaE] 

we obtain, 

E [df 1 Zir = 1, = Var [z?j | Zir = l,g] + (E [■!?* | Zir = l,g])‘^ 


< 




< |/ipP(L001) 


(124) 
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where, 6 < n 

Substituting (jl24l) and (11231) into (11221) . we have, 

L 

Var [y, I a] < 22'-E [^U^r \ G] 


r=0 


< 


< 


|/.P^’(1.001)^22^Pr[iGa 16^] 


r=0 


2'+^(1.001)l/i|2p^2''Pr[iGa|e] 


r=0 


< (1.001)2'+i|/i|2P(l + 5) 

< (1.002)2'+^ l/ip?* 


(125) 


□ 


Step 2 uses p24|) . Step 3 uses Lemma [5] to argue that if i G Gi, then, Pr [i G Gr | ^] =0 for all 
r > I + 1. Hence, the summation from r = 0 to L is equivalent to r ranging over 1 — 1,1 and Z + 1. 
So the term 2^'' < 2*+^ 2''. The last step again uses Lemma [8] to note that 2^Pr [Z G | ^] = 

1 ± 0 ( 2 'n-‘=). 

G.3 Covariance of L) and Yj 

Proof of Lemma \TR. Let i j, i G Gi and j G Gm- 

Cov {Yi, Yj I g) = E [Y^j \g]-E[Yi\g]E [Y^ \ g] 


= E 


2'^Zir^i 2'’ Zjr''&j I g 


_r=0 


r '=0 


-E 


Y,‘^"Zirh I g 


_r=0 


E 


2 I g 


Lr -'=0 


2^+'’ E [DiDj I Zir = 1, Zjr' = 1, g] Pr [Zir = 1, = 1 I 

- 2 ^+"'E \ Zir = l\g]E [^j I = 1, a] Pr [Zir = 1 I g] Pr [zjr' = l\g] 

2’'+'-' y] E [Mj I fi, fj ,Zir = l, Zjr' = l,g 


0<r,r^<L fiifj 


• Pr 




Pr ^Zi'p — 1, Zj'p' — 1 I 


2r+r' lYE[^^\^,Zir = l,g 


Pr 


fi I Zir — 1) ^ 


Pr [zir = l\ g] 


h 


y^E I fj,Zjr' = l,g Pr fj 1 Zjv=i,0 Pr = 1 I 


(126) 


By Lemma [151 


E 


I fiifjjZir — IjZjY' — 1)0 


= E 


I fiifjtZir — l,Zj>' — 1)0 'IE Dj I fi,fj,Zir — 1, Zjr' — 1)0 


(127) 
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By Lemma [T^ for any value of /* satisfying Q and i € Gr such that r is consistent with | /j |, we 
have, 

fi,Z^r = l,E',g] =\fi\P{l±d) 

where, E' is any subset (including the empty subset) of the events {fjAzjr' = 1} and 6 = 
Substituting in (I127p and for r,r' consistent with |/j| and \fj\ respectively, we have, 


E 


1 fiJj,Zir = l,Zjr' = 1,^ = l/iri/?T(l ± 0(5)) 


In a similar manner, it follows that 


E 


I /i) Zir — 1) ^ 


E 


^ 7 I fj 1 Zjr — 1) ^ 


= \MP\f,\P{l±0{6)) 


Substituting these into (I126p . we have, 

E [YiYj \g]-lK[Yi\g]E [Y^ \ g] 

[2'-+^'l/iri/jr(l ± 0(5)) ^ Pr [fij, I Zir = l,Zjr> = l,g]- Pr [Zir = l,Z,r' = 1 | 0 ] 

fiJj 


0<r,r'<L 
r, r' consistent 
with 2 , j resp. 


- 2 


'{\MP\fj\P{l±0{5))) (^Pr [fi I Zi, = l] Pr[zi, = l,g 

h 

■ (X] fj I Zjr'=l,G Pr [zjr' = 1 I ^]) 

fi 

[2’'+^'|/*n/.r(l±0(5))Pr [z,r = I,Zjr' = 1 I e] 

0<r,r'<L 
r, r' consistent 
with 2 , j resp. 

- ± 0(5))Pr [zir = 1 I 0] Pr [zjr' = 1 | 0] 


(128) 


since, each of the summations, namely, (a) /. Pr fi^fj I Zir = = 1,0 , (b) Ylf Pr[/i 


Zir = 1,0] and (c) Ylfj Pr fj I ^^>'= 1,0 
Further, 


are 1 respectively. 


^ 2''+^'Pr [zir = 1, Zjr' = 1 I 0] = 2"+^'Pr [zir = 1, Zjr' = 1 | 0] 

0<r,r'<L 


0<r,r'<L 
r, r' consistent 
with i,j resp. 


= 1 ± 0(2^ + 2”^)n by Lemma 06] 


since, if levels r and r' are not consistent respectively with |/j| and |/j| respectively then Pr[2;jr = 
1, Zjr' = 1 I 0] =0. The same applies to summations over r of 2'’Pr [zir = 1 | 0], etc.. 

By Lemma [8| (part 4), 


^2^Pr[z,r = 1 I 0] = 2^ ■Pr[zir = l\g] = l±2^n 


l^—c 


r=0 


r consistent 
with 2 
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Similarly, Y1 r' consistent with j [■? ^ Combining and taking absolute values 

of both sides in (112811 and replacing equality by <, we have, 

|Cov(y„y ,-1 g)\ 

< l/*ri/ir ((1 ± 0(5))(1 ± 0(2' + 2™)n-") - (1 ± 0(<5))(1 ± 0(2^n-"))(l ± 0(2™n-"))) 

= l/*ri/irO(5+(2^ + 2™)n-'^) 

□ 


G.4 Variance of Fp estimator 

Proof of Lemma\TR Let K = 425, so that B = Kn^~‘^/P€~'^f min(log(n), 
We have. 


Var[Fp] = Var 


Ehie 


le n 


n 


—c+l\ 


< ^ Var [y, I g] + ^|Cov(yi,y, | g)\ 

ie[n] i^j 

< Y 1 Var[y, |g]+ ^ Var[y, I g] + F2.o(, 

iSmid(Go) iS[n],i^mid(Go) 

5 E (e E 2«(1.002)|/.|^-|+O(n-«)F, 


iSmid(Go) 


/=0 isG;,i^mid(Go) 


(129) 


Step 3 follows from Lemma [T8l since, 

^|Cov(yi,y, I g)\ <^0{n-^+^mP\f,\P < Oin-^+^F^) . 


i¥=j 


i¥=j 


Step 4 uses Lemma fT71 Since, F 2 < ^ 2(1 + 0.01/(2p)), < (1.01)^2- Also, F 2 < Fp^'^vf 

Therefore, 




e^Fp . 


(130) 


For any set S C [n] and q > 0, let Fq{S) denote Ylies l/il'"- 
Let i e Imargin(Go) 'jjPi Gi. By dehnitions of the parameters. 


\h\ < 7]_i < 


1/,| <ro(l + e) < 


^2 (1 + ^ 

{2aY-^B 

F2[ 


1/2 


1 , Ml 

2p 


1/2 


B 


1 + 


27p 


if i £ Gi and I > 1, 


if i G Imargin(Go) • 
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We consider the first summation term of Eqn. (|129l) . that is, 


E 

iSmid(Go) 


(0.3)e2|/,|2p-2i7p2/p 

(10)4^: 


(W)(425)) 


^ (iwS) 


(0.3)<"F„" 
((10)‘‘)(425) 


< 


(131) 


We now consider the second summation term of Eqn. (I129p . that is, 

L 

Y 2'+i(1.002)|/i|2P 

1=0 iSG;,i^mid(Go) 

L 

< Y (132) 

2 Glmargin(Go) ^=1 '^^Gi 


We will consider the two summations in Eqn. ()132l) separately. 

^ (2)(1.002)(ro(l + e-))P|/,r 

iSlmargin(Go) 

< (2)(1.002) (1.01)e^/^'^Ep(lmargin(Go)) 

/ ^l-2lpjp2lp \p/2 

- ( (425)ni-2/Pc-4/p J 

= • ^p(lmargin(Go)) 

We now consider the second summation of Eqn. (jl32l) . 


EE 

1=1 i^Gi 

L 

<(2)(1.01) J]2' 
1=1 
L 

= (4)(1.01)^2' 

i=i 


F2 


{2aY-^B 


p /2 


Fp{Gi) 




p /2 


(425)(2a)* 2 /pg 4 /p 

L 


Fp{Gi) 




1=1 


(133) 


( 134 ) 


Eurther, 


2«(2a)(^-i)(-P/2) = {2afG2^(2a)-^PG = ^2a)PG2^(^-(p/2)iog2(2a)) _ 
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Let 7=1 — q: = (1 — 2lp)v, where, v = 0.01. Therefore, 

log,(2a) = l + log,(a) = l + hM = l + “'<'-^> 


27 

>1 -^ = 1 - 

In 2 


In 2 
2(1 - 2/p)iy 
In 2 


In 2 

> 1-(1-2/p)(3z.) 


( 136 ) 


Using eqn. (11361) . we can simplify the term 1 — (p/2) log2(2a) as 
1 - (p/2) log2(2a) < 1 - (p/2) (1 - (1 - 2/p)(3z/)) = -(p/2 - 1)(1 - 3z/) = -(p/2 - 1)(0.97) < 0 
and is a constant. 

Substituting this into Eqn. ()135p and then into ()134p . we have. 


EE 

l=l ieGi 

( (4.04) \ 2 


.IP 


< 




«)"('"i)(^/2)Ep(Gi) 


< (4.04) 


2a ^ 


425 J 

L 


MGi) 

1=1 




1=1 


< ^-jFp-FpiujLiGi) . 

Adding Eqns. (I133p and (I137p . Eqn. ()132p becomes 


(137) 


^ ^ 2^+i(1.002)|/g|2P 

1=0 i£Gi,i^mid(Go) 

V 2 OO 


< 


e‘^Fp ■ Ep(lmargin(Go)) + Fp • Fp{u(L^Gi) 


2jp2 


< 


e^F 

53 


(138) 


Substituting Eqn. (|13ip and Eqn. (|138p in Eqn. (I129p . we have, for n sufficiently large, that 

e^F^ 


Var 


< 


50 


□ 


G.5 Putting things together 

Proof of Theorem 1^01 Consider the Geometric-Hss algorithm using the parameters of Eigure [2j 
By Lemma [71 Q holds except with probability n~'^, where, c > 23. Erom Lemma [T^ Var[Ep] < 
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e^Fp/50. Using Chebychev’s inequality, 


Pr 


|4-E[Fp]| < {e/2)Fp I G 


> 1 - 


Var[i^p] 4 

((e/2)F,)2 50 


(139) 


By Lemma [T6l |E[Fp | G\ ~ F'p\ ^ Fp{2^~^^n ^). Combining with Eqn. (11391) . by triangle inequality, 
we have, 

4 


Pr 


\Fp - Fp\ < ((e/2) + 2^+in-")) Fp \ Q 


> 1 - 

50 


which implies that 


Pr 


F-p — Fp\ < eFp I G 


< 


46 


since, 2^ <C n. 

Since Pr [^] > 1 — unconditioning w.r.t. G^ we have. 


Pr 


I Up — Up I < eFp 


> f (1 - 0{n-‘)) > 0.9 


The space required at level 0 is Cqs = Cs, at level I it is Cis and at level L it is WCls. Here, 
s = 8k = 8(1000) log(n) = O(logn). Further Ci = 4(a)*C'. Thus, the total space is of the order of 


^ Qs = (log(n)) ^ JC < = O 

1=0 1=0 


n 


l-2/p 


log(n)e 


-2 


— a (1 — 2/p)i/ 


min(log(n), e'^/P ^) 


The last expression for space may also be written as log(n)). 

The time taken to process each stream update consists of applying the L hash functions 
gi,..., to an item i. Each hash function is 0(logn)-wise independent and requires time 0(logn) 
to evaluate it at a point. The time to evaluate L = log 2 Q,(n/C) functions is 0(log^ n). Additionally, 
for each level, the hash values for % have to be computed for each of the s hash functions of the HH; 
and TPEST; structures. These hash functions are 0(l)-wise independent, and they can collectively 
be computed in 0{Ls) = O(log^n) time. This proves the statement of the theorem. 

□ 
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